Analysis of Recombination in Campylobacter jejuni from MLST Population Data Paul Fearnhead1,5 , Nick Smith1 , Mishele Barrigas2 , Andrew Fox3 and Nigel French4 1. Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, UK 2. University of Staffordshire, Stafford, UK 3. Health Protection Agency North West, Manchester Medical Microbiology Partnership, Manchester Royal Infirmary, Manchester, UK 4. Institute of Veterinary, Animal and Biomedical Sciences, Massey University, New Zealand 5. Correspondence should be addressed to Paul Fearnhead. (e-mail: [email protected]). Summary: We analyse recombination in C. jejuni using MLST data from isolates taken from wild birds, cattle, wild rabbits and water in a 100km2 study region in Cheshire, UK. We use a recent approximate likelihood method for inference, based on combining likelihood information from all pairs of segregating (polymorphic) sites in the data. We find substantial evidence for recombination, but only for recombination with short tract lengths, of around 225bps–750bps. We estimate that the rate of recombination is of a similar magnitude to the rate of mutation. Keywords: Bacteria, Campylobacter, Helicobacter, MLST, Recombination Rate, Recombination Tract Length 1 1 Introduction Campylobacter jejuni is one of the most common agents of bacterial gastroenteritis. It is carried asymptomatically by many domestic and wild animals, and is also present in environmental locations such as soil and water. We present an analysis of multilocus sequence typing (MLST) data from C. jejuni isolates taken from a range of animal and environmetal sources (French et al., 2004), with the aim of learning about the recombination process of the bacteria. Recombination is a major source of genetic diversity in pathogens, and perhaps the major mechanism responsible for genetic diversity. Learning about the recombination process in pathogens is important for understanding the epidemiology of disease and therefore their ultimately control and pevention. For example, recombination has been responsible for capsular polysaccharide switching in Neisseria meningitidis and Streptococcus pneumoniae which may result in vaccine failures (Claus et al., 2002). Within C. jejuni, recombination has been responsible substantial increased genetic diversity at the unc and pgm loci (French et al., 2004). There have been previous analyses of MLST data from C. jejuni (Suerbaum et al., 2001), but these have primarily focussed on evidence for recombination, and qualitative measures of the amount of recombination such as Homoplasy ratio. We present a likelihood-based analysis of the data, based on using the likelihood information from all pairs of segregating sites. Likelihoods are calculated under a coalescent model. We present formal tests for the presence of recombination, and estimates of both the relative amount of recombination as compared to mutation, and the mean recombination tract length. We find substantial evidence of recombination, with mean tract lengths in the region of 225bps–750bps; and with recombination occuring at a similar rate to mutation. 2 2 Material and Methods Campylobacter Data. The data is taken from French et al. (2004), and consists of Multi-Locus sequence types (MLSTs) from 172 isolates of Campylobacter jejuni. The isolates were taken from a variety of farm and wildlife sources over a 100km2 area of predominantly dairy farmland. For each isolate the data consist of the DNA sequence for approximately 500bp fragments of 7 house-keeping genes which are randomly located along the 1.6Mb genome. A small number of isolates had an allele at one of the genes which is substantially divergent from the majority of the alleles at that gene, and may have originated in a different Campylobacter species. (For example Colles et al., 2003, suggest that allele 17 at gene unc is from C. coli). These were alleles 113 at gene pgm (found in 1 rabbit and 1 water isolate), and alleles 17, 38 and 42 at gene unc (found in 18 cattle isolates). A summary of the data is given in Table 1. Helicobacter Data. To validate our methods, we also analysed MLST data from Helicobacter pylori, for which there is independent inference about the recombination process (Falush et al., 2001). We analysed data at 5 loci (atpA, efp, ppa, trpC and ureI) for 18 isolates collected from New Zealand Maoris, and 12 isolates collected from Korea. The data was obtained from the pubMLST isolate database (http://pubmlst.org/helicobacter/). Approximate Likelihood Methods. We used an extension of the pairwise likelihood method of Hudson (2001) to include a finite-sites mutation models (McVean et al., 2002). This is an approximate likelihood method which (i) calculates the likelihood for each pair of polymorphic sites; and (ii) multiplies together the likelihoods for all such pairs of sites. The approximation in this method is that (ii) would only be valid is the 3 Genetic Diversity of C. jejuni Isolates. Gene Fragment Polymorphic Distinct Tajima’s length (bp) sites Haplotypes D asp 477 19 14 0.57 gln 477 34 25 -0.69 glt 402 23(1) 23 0.31 gly 507 42(1) 24 1.88† pgm 498 94(10) 29 -0.38 47(2) 28 1.77 pgm∗ † tkt 459 43(2) 29 0.03 unc 489 81(5) 20 0.43 27(1) 17 -0.62 336(19) 65 0.22 233(7) 58 0.60 unc∗ Total Total∗ 3,309 Table 1: Summary of genetic diversity of the C. jejuni isolates. The numbers of the polymorphic sites with three or four alleles for each gene are given in brackets. ∗ Excluding isolates with inter-strain alleles. For pgm 2 isolates with allele pgm113 were removed; for unc, 18 isolates with alleles unc17, unc38 or unc42 were removed; while for Total, all these 20 isolates were removed. † Significantly different from 0 at the 5% level. 4 data from each pair of sites were independent; which is clearly not the case. The likelihoods in (i) are calculated using computational statistical techniques (Fearnhead and Donnelly, 2001), assuming a neutral, constant-sized, panmictic population coalescent model. Our mutation model is based on a 4-allele model at each site, with mutations between all pairs of alleles being equally likely, and a constant mutation rate across all sites. While this model is simplistic, simulation studies show that inferences are robust to deviations from the model assumptions (Smith and Fearnhead, 2004; McVean et al., 2004). While theory is limited for the pairwise likelihood approach (see Fearnhead, 2003, for some results), it has been widely used for analysing recombination processes from population data (see Stumpf and McVean, 2003; Awadalla, 2003), due to its speed of computation and the flexibility it allows for the underlying recombination process (e.g. variable recombination rates, and distributional assumptions for tract lengths). Testing for Recombination Our data contains information about the recombination rates over both short (within genes - up to 500bp) and long (between genes, of the order of 40-800kb) distances. Using permutation tests we tested the hypotheses of (i) presence of recombination in C. jejuni and (ii) the presence of recombination acting over large distances. To test (i) we fitted a model that recombination rate scales linearly with physical distance. The pairwise likelihood for all pairs of polymorphic sites within each gene was then maximised under this model. To calculate the significance of this value, we then analysed 1,000 permuted data sets, where we shuffled the positions of the polymorphic sites. To test (ii) we fitted a model for the recombination rate between sites in different genes, i and j, being ρij = a + bdij (G − dij ) where dij is the distance between genes i and j, and G is the size of the genome (1.6Mb). The constant term in this expression reflects the effect of recombination with short tract-lengths, assuming 5 that the tract lengths are smaller than the distances between the genes. The effect of such recombination will be the same between any pair of genes. The bdij (G−dij ) term is consistent with recombination where the two breakpoints are uniformly distributed about the genome. We maximised the pairwise likelihood for all pairs of polymorphic sites in different genes under this model. To assess the significance of this value we then analysed permuted data sets obtained by shuffling the positions of the 7 genes. The power of both these tests is unknown, but should be comparable. If anything the power for testing (ii) may be larger - as there are more pairs of sites in different genes than within the same gene. While for each test we assume a specific model for the relation of recombination to physical distance, it is hoped that these models will give us power to detect a range of models which are consistent with the presence of (i) recombination or (ii) recombination acting over long ranges. Recombination Model Our model for recombination, assuming that recombination tract lengths are small compared to the size of the genome, is as follows. Let X be the (random variable of the) tract length of a recombination event, and ρ be the rate at which recombination events occur (per bp). Then the recombination rate for two sites at a distance y (where y is much smaller than the size of the genome) apart is 2ρ y X Pr(X ≥ x). x=1 The factor of 2 appears due to our definition of ρ, as for each recombination event there will be two recombination breaks. The sum in this expression appears due to the need for precisely one of the recombination break points to lie between the two sites. Using evidence from Drosophila (Hilliker et al., 1994), it is common to assume X has an exponential distribution (e.g. Wiuf and Hein, 2000; Falush et al., 2001), and in this case our expression simplifies to previously derived expressions (see for example Frisse et al., 2001). However in order for our inferences to be robust to deviations from this model, we assume that X has a discretised gamma 6 distribution; which is a generalisation of the exponential distribution. Let µ be the mean tract length and α be the shape parameter of the gamma distribution. We calculate the pairwise log-likelihood l(ρ, µ, α) over a grid of ρ, µ and α values, but use the profile log-likelihood pl(ρ, µ) = maxα l(ρ, µ, α) to perform inference. This approach accounts for uncertainty in the distribution of the tract length when making inference for ρ and µ. Confidence Intervals To obtain an approximate confidence interval for µ (and similarly ρ) we used the following scaled Likelihood Ratio statistic LR(µ) = 2(pl(µ̂) − pl(µ))/S, where pl(µ) = maxρ pl(ρ, µ) is the profile log-likelihood for µ, S is the number of segregating sites. The scaling is to account for the fact that the number of pairs of sites increases quadratically with S, whereas the amount of information should increase at best linearly with S. We use simulation to obtain the empirical distribution of the statistic. Data was simulated (see below) under a fitted model, where the parameters of the recombination process were fixed at their mles, and the mutation process was chosen to produce (on average) the same number of segregating sites in the data. For each simulated data set we obtained a value of the scaled Likelihood Ratio statistic evaluated at the true parameter value. Repeatedly simulating data and calculating the scaled Likelihood Ratio statistic enables us to build up an empirical distribution for the statistic. We approximated the 95th percentile of the true distribution by the 95th percentile of the empirical distribution; and included in a 95% confidence interval all values of the parameter for which the scaled Likelihood ratio statistics (for the real data) was less than this percentile. Simulation. Sequence data was simulated for a linear sequence of 7 loci of length 500 bp separated by 10 kb gaps for Campylobacter and 5 loci of length 500 bp separated 7 by 10 kb gaps for Helicobacter. Details of the sequence simulations (e.g. numbers of samples and segregating sites) were chosen to correspond roughly with the values for the data set of interest, except for the gap length which does not affect patterns of variation when gap length is greatly in excess of mean tract length. First, the ms program of Hudson (2002) was used to construct a treefile (consisting of a set of genealogies for different portions of the sequence) under the standard neutral model for assuming gene conversion with an exponentially distributed tract length. DNA sequence data was generated using the treefile with the seqgen program of Rambaut and Grassly (1997) under the Jukes-Cantor model of DNA substitution with rate variation between sites corresponding to gamma distribution with shape parameter 0.5. 3 Results Interspecies Haplotypes. A qualitative picture of the recombination process in C. jejuni can be obtained by examining the haplotypes of isolates with alleles at either pgm or unc which appear to have come from other strains of Campylobacter (see Table 2 for the allelic profiles of these isolates). We shall call these the “interspecies haplotypes”, and the respective alleles at pgm and unc “interspecies alleles”. (This general approach to learning about recombination is similar in principle to that of Feil et al., 2000). For the isolates with haplotypes A1–A5 (see Table 2), there appears to be a simple picture. Alleles unc38 and unc42 each differ from unc17 at a single site, and thus haplotypes A3 and A4 each appear to be derived from haplotype A1 by a single mutation. Haplotypes A2 and A5 differ from haplotype A1 at a single gene, with A2 being produced by a recombination event in gene gly with a tract length of at least 217bps (allele gly2 has frequency 22 in the sample); while A5 has been produced by a recombination event in gene unc of at least 484bps (the 8 Allelic Profile of Interspecies Haplotypes. ID asp gln glt gly pgm tkt unc Number Source A1 1 4 2 4 6 3 17 14 Cattle A2 1 4 2 2 6 3 17 1 Cattle A3 1 4 2 4 6 3 38 1 Cattle A4 1 4 2 4 6 3 42 1 Cattle A5 2 1 1 3 2 1 17 1 Cattle B1 18 85 22 104 113 105 6 1 Water B2 18 100 22 104 113 105 6 1 Rabbit Table 2: The allelic profile, frequencies and sources of the interspecies haplotypes. The interspecies alleles are 17, 38 and 42 at unc (top of table, haplotypes A1–A5) and 113 at pgm (bottom of table, haplotypes B1 and B2). allelic profile of A5 at the remaining 6 genes is at frequency 24 in the sample). Inferring the history of haplotypes B1 and B2 (see Table 2) is more difficult. These haplotypes differ solely at gln, allele gln85 only appears on haplotype B1, while gln100 appears in one further isolate. There are five mutational differences between the two gln alleles; including two mutations that only appear in gln100 and one mutation that only appears in gln85. There are no possible recombinations between either of these two interspecies gln alleles and one of the gln alleles in the sample that would produce the other interspecies allele. Perhaps most likely is that haplotype B2 has evolved from B1 via a recombination in gene gln (of at least 142 bps). It is impossible to tell whether the mutation only found in gln85 occurred before or after the recombination event with pgm113. Detection of Recombination. A qualitative picture of the recombination process in C. jejuni can be obtained from a plot of pairwise Linkage Disequilibrium for all segregating sites whose minor allele frequency is greater than 10% (see Figure 1). There is evidence for 9 greater LD within than between genes (the higher proportion of red for comparisons of pairs of sites within genes), but not for greater LD for genes which are closer together on the genome (the approximate exchangeability of the patterns for sites in different genes). This suggests a recombination process with short (compared to the inter-gene distances) tract lengths. To formally test for the presence of recombination acting over different distances we used an exact permutation test (see MATERIALS and METHODS). We found significant evidence for recombination acting within genes (p-value < 0.001, based on permutation of polymorphic sites), but no evidence for recombination acting over large distances (of the order of 100kb; p-value 0.64, based on permutation of genes). Inference of Recombination Process: Validation of Method. To test our method for estimating the rate of recombination and the mean tract length we analysed both (i) simulated data sets and (ii) data from H. pylori for which there is independent evidence of the recombination tract length (see MATERIALS and METHODS). For (i) our data was simulated assuming the recombination tract length had an exponential distribution, and under a finites sites mutation model that had a large degree of rate variation; data was simulated under a model consistent with the date from the cattle isolates. In analysing the data we assumed a gamma distribution for the tract length, with a shape parameter ranging between 1/2 and 2. (The exponential distribution corresponds to the gamma distribution with a shape parameter of 1; for a fixed mean, the variance of the tract lengths doubles as compared to the exponential distribution when a shape parameter of 1/2 is used, and halves with a shape parameter of 2.) For our analysis we assumed a constant mutation rate for each sites. The presence of mutation-rate variation in the simulated data biased our inference method towards smaller tract lengths and larger recombination rates. This biasing appeared to be primarily caused from the contribution to our pairwise 10 likelihood of nearby pairs. To make inference more robust to this mutation-rate variation we excluded from the pairwise likelihood all pairs of sites which were less that 50bps apart. Histograms of the mles of the tract length and recombination rate from 1,000 simulated data sets are given in Figure 2. The average values of these estimates were 470bps and 6.3 per kb (compared to true values of 500bps and 5 per kb) respectively. We then tested our approach on MLST data from 5 gene fragments in H. pylori. Falush et al. (2001) have obtained estimates of the mean recombination tract length in H. pylori of 417bps (95% credible interval 259-732bps). These estimates are based on data from serial isolates and are independent from the MLST data described in MATERIALS and METHODS. We repeated the approach above to estimate the tract length from the Maori isolates, the Korean Isolates and a sample consisting of both Maori and Korean Isolates. Results are given in Table 3, with confidence intervals as described in MATERIALS and METHODS. For the individual Maori and Korean populations, the estimates of tract length are consistent with those of Falush et al. (2001); but the analysis of the combined data set appears to be under-estimating the mean tract length. One explanation for this is that the combined data set strongly violates the random-mating assumption of the the model (For H. pylori population structure appears to correspond strongly to geographical locations; see Falush et al., 2003). The presence of population structure will lead to LD decaying more slowly with genetic distance; and thus to relative underestimates of the amount of recombination between as compared to within genes (Smith and Fearnhead, 2004). As the recombination rate between genes is proportional to the product of the actual rate of recombination events and the mean tract length; the net effect will be an underestimate of the mean tract length. For H. pylori the true recombination rate between genes is large (of the order of 25–50 for the combined data set) and so the effect of structure will be particularly pronounced (Smith and Fearnhead, 2004). Inference about Recombination Process in C. jejuni. 11 Estimates of Tract length in Helicobacter. Source µ̂ (bps) Maori 500 (225–1400) Korea 250 (50–1200) All 175 (100–325) Table 3: Estimates and, in brackets, approximate 95% confidence intervals of the mean tract length (µ) for H. pylori isolates. When analysing the C. jejuni isolates we removed all isolates which contained any of the alleles at pgm or unc which are potential recombinants with a different strain of Campylobacter. Our inference method is based on a model of randommating, namely that C. jejuni recombines with other members of C. jejuni at equal rates but never with other species, and the presence of alleles which are descended from a different Campylobacter strain would substantially violate such an assumption. So as to both better fit a model of random-mating (there is evidence that C. jejuni is exchanged between sources of the same type at a faster rate than between sources of different types French et al., 2004), and to potentially detect any differences between them, we analysed the isolates from each of the four main sources (birds, cattle, rabbits and water) separately. Table 4 shows the estimated parameters of the recombination process for isolates from each of the four main sources. There is substantial variation in both the estimates of the recombination rates and the mean tract lengths across the different sources, but approximate 95% confidence intervals overlap for both parameters across the four sources except for the recombination rates in cattle and water. The variation in recombination rate may be caused by differences in the effective population size of isolates in different sources; however the estimated mutation rate based on pairwise differences (and also based on the number of segregating sites; data not shown) is similar across the four sources. Estimates 12 Recombination Parameter Estimates. Source θ̂ per kb ρ̂ per kb µ̂ (bps) Birds 10.3 12 (6.7–22) 450 (250–800) Cattle 12.1 3.7 (1.7–6.7) 750 (300–1500) Rabbits 13.7 6.7 (2.7–11.8) 300 (140–1500) Water 14.4 15 (6.7–23) 225 (80–500) Table 4: Estimates and, in brackets, approximate 95% confidence intervals of the recombination rate (ρ) and mean tract length (µ) for C. jejuni isolates from 4 different sources. Isolates with alleles at pgm or unc which are potentially descended from a different strain of Campylobacter were omitted from the analysis. For comparison, an estimate of the mutation rate, based on mean pairwise differences, is given for each source. of the recombination rate between genes also shows variation across the four sources (mles of 11, 6, 4 and 7 for birds, cattle, rabbits and water respectively). In general the recombination rates are smaller than the mutation rates, and if there is considerable mutation rate variation then the estimates of the mutation rates are likely to be under-estimates of the true mutation rate. However, under our definition of recombination rate, each recombination event produces two recombination break points - and thus the effective rate of recombination breaks is twice that given in 4. 4 Discussion We have used population data to make inferences about the recombination process in C. jejuni. There is very strong evidence both for recombination, and recombination tract lengths that are small compared to the distances between genes. Our estimates suggest mean tract lengths that are in the region of 225bp750bp; and a recombination rate that is similar in magnitude to the mutation 13 rate. This results are consistent with the qualitative patterns we observed in the intra-strain haplotypes. Our estimate of tract length is substantially smaller then the 3.3kb estimate of Schouls et al. (2003), using the method of Feil et al. (2000). This method is based on finding closely related isolates within populations (for example isolates with identical alleles at 6 of the 7 genes), and analysing the genetic differences of such closely related strains. Specific genetic differences are classified as due to either mutation or recombination, with any single base change being classified as a mutation. If recombination tract lengths roughly have an exponential distribution, then many recombination events will change small genomic regions, and thus may alter the DNA only at a single base. As a result, classifying all single base changes as mutations will lead to an overestimate of the recombination tract length, and may explain the larger estimate of Schouls et al. (2003). Furthermore, the estimate of Schouls et al. (2003) is based on a small amount of data and they state that “its validity is somewhat questionable”. Our results suggest that tract lengths in C. jejuni are similar to those in H. pylori, and much shorter than other bacteria (where estimates range from 2kb to 14kb Falush et al., 2001). The similarity with H. pylori is not suprising given the biological similarities of the two pathogens, including the main mechanism of recombination being transformation (Suerbaum et al., 2001). Laboratory studies (De Boer et al., 2002) have reported examples of complete genes being moved via recombination. Such events require recombination tract lengths of several kilobases. Our results suggest that while such large recombination events can occur, they are likely to be rare (for example if mean tract length is 500bps, the probability under an exponential model for a tract length in excess of 3kb is around 0.25%), and that the vast majority of recombination events affect much smaller regions of the genome. Our recombination model allowed for uncertainty in the distribution of the tractlength size; whereas a more common approach is to assume an exponential distri14 bution. In practice this made little difference to the estimates of the mean tract length (the estimates are around 20% smaller than if the data were analysed under an exponential model); though it produces wider confidence intervals which allow for this extra uncertainty. The data contained little information about the shape parameter. The information in the data about the recombination rate and tract length is obtained by having Linkage Disequilibrium (LD) information on two different scales: between and within genes. The LD between genes is informative about the product of the recombination rate and tract length; whereas the LD within genes is governed primarily by just the recombination rate (because the gene fragment sizes are similar to the tract length). As a result we have much greater power at estimating both the recombination rate and tract length than from a contiguous region of DNA of the same size (Wall, 2004). Note, it should be possible to improve the accuracy of studies such as the one presented here by using the three-site likelihoods presented in Wall (2004). The two main assumptions on our inference method are that (i) mutation rates are constant across sites; and (ii) the sample of isolates is taken from a randomly mating population. The problem with (i) is that repeat mutation can lead to over-estimates of recombination rates, particularly between nearby sites. However, we have demonstrated our robustness to (i) through simulation, where we are able to accurately estimate tract lengths (and recombination rates) in the presence of considerable rate variations; and through obtaining reasonable estimates of tract lengths for H. pylori. The problem with (ii) is that population structure affects the decay of LD with there being excess LD over long distances. This could cause an excess of LD between genes relative to within genes, and hence an under-estimate of the mean tract length. Some evidence of this effect was noted in the analysis of the H. pylori data: with smaller estimates of tract length obtained for a mixed population of Korean and Maori isolates. To minimise this problem we analysed 15 the isolates from each of the four sources separately, and removed the intra-strain haplotypes. We further tested the robustness of our approach to population structure via a simulation study, with parameters fixed to those estimated for the water isolates. We assumed a two-island demographic model (Donnelly and Tavaré, 1995), with three different migration rates, 20, 5 and 1. These produce FST values ranging from approximately 0.01 to 0.2 (Hudson et al., 1992) - the larger values of F ST correspond to stronger population structure. The mean of the estimates of the mean tract length across 100 simulated data sets for each scenario varied from 245 to 230 (compared to the truth of 225; the over-estimation is due to the skewed distribution of the mles in each case). These results suggest that the method is robust to the degree of population structure present in these simulations. As a rough comparison, the FST values for C. jejuni isolates presented in Colles et al. (2003) vary from 0.005 to 0.094 for a comparison of human and animal isolates. Acknowledgements This work was supported by EPSRC grant GR/S18786/01, and by the Environment Institute and the Department for Environment, Food and Rural Affairs (DEFRA). We thank Daniel Falush for helpful comments. 16 References Awadalla, P. (2003). The evolutionary genomics of pathogen recombination. Nature Review Genetics 4, 50–60. Claus, H., Maiden, M. C. J., Maag, R., Frosch, M. and Vogel, U. (2002). Many carried meningococci lack the genes required for capsule synthesis and transport. Microbiology-SGM 148, 1813–1819. Colles, F. M., Jones, K., Harding, R. M. and Maiden, M. C. J. (2003). Genetic diversity of campylobacter jejuni isolates from farm animals and the farm environment. Applied and Environmental Microbiology 69, 7409–7413. De Boer, P., Wagenaar, J. A., Achterberg, R. P., van Putten, J. P. M., Schouls, L. M. and Duim, B. (2002). Generation of campylobacter jejuni genetic diversity in vivo. Molecular Microbiology 44, 351–359. Donnelly, P. and Tavaré, S. (1995). Coalescents and genealogical structure under neutrality. Annual Review of Genetics 29, 401–421. Falush, D., Kraft, C., Taylor, N. S., Correa, P., Fox, J. G., Achtman, M. and Suerbaum, S. (2001). Recombination and mutation during long-term gastric colonizaion by helicobacter pylori:Estimates of clock rates, recombination size, and minimal age. PNAS 98, 15056–15061. Falush, D., Wirth, T., Linz, B., Pritchard, J. K., Stephens, M., Kidd, M., Blaser, M. J., Graham, D. Y., Vacher, S., Perez-Perez, G. I., Yamaoka, Y., Megraud, F., Otto, K., Reichard, U., Katzowitsch, E., Wang, X. Y., Achtman, M. and Suerbaum, S. (2003). Traces of human migrations in Helicobacter pylori populations. Science 299, 1582–1585. Fearnhead, P. (2003). Consistency of estimators of the population-scaled recombination rate. Theoretical Population Biology 64, 67–79. 17 Fearnhead, P. and Donnelly, P. (2001). Estimating recombination rates from population genetic data. Genetics 159, 1299–1318. Feil, E. J., Smith, J. M., Enright, M. C. and Spratt, B. G. (2000). Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. Genetics 154, 1439–1450. French, N. P., Barrigas, M., Brown, P., Ribiero, P., Williams, N. J., Leatherbarrow, H., Birtles, R., Bolton, E., Fearnhead, P. and Fox, A. (2004). Spatial epidemiology and natural population structure of campylobacter jejuni colonising a farmland ecosystem. submitted to Environmental Microbiology . Frisse, L., Hudson, R. R., Bartoszewicz, A., Wall, J. D., Donfack, J. and Di Rienzo, A. (2001). Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. American Journal of Human Genetics 69, 831–843. Hilliker, A. J., Harauz, G., Reaume, A. G., Gray, M., Clark, S. H. and Chovnick, A. (1994). Meiotic gene conversion tract length distribution within the rosy locus of Drosophila Melanogaster. Genetics 137, 1019–1026. Hudson, R. R. (2001). Two-locus sampling distributions and their application. Genetics 159, 1805–1817. Hudson, R. R. (2002). Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338. Hudson, R. R., Slatkin, M. and Maddison, W. P. (1992). Estimation of levels of gene flow from DNA-sequence data. Genetics 132, 583–589. McVean, G. A. T., Awadalla, P. and Fearnhead, P. (2002). A coalescent method for detecting recombination from gene sequences. Genetics 160, 1231–1241. 18 McVean, G. A. T., Myers, S. R., Hunt, S., Deloukas, P., Bentley, D. R. and Donnelly, P. (2004). The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584. Rambaut, A. and Grassly, N. C. (1997). Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 13, 235–238. Schouls, L. M., Reulen, S., Duim, B., Wagenaar, J. A., Willems, R. J. L., Dingle, K. E., Colles, F. M. and Van Embden, J. D. A. (2003). Comparative genotyping of campylobacter jejuni by amplified fragment length polymorphism, multilocus sequence typing, and short repeat sequencing: Strain diversity, host range, and recombination. Journal of Clinical Microbiology 41, 15–26. Smith, N. G. C. and Fearnhead, P. (2004). Comparative performance and robustness of three estimators of the recombination rate. In preparation . Stumpf, M. P. H. and McVean, G. A. T. (2003). Estimating recombination rates from population-genetic data. Nature Review Genetics 4, 959–968. Suerbaum, S., Lohrengel, M., Sonnevend, A., Ruberg, F. and Kist, M. (2001). Allelic diversity and recombination in Campylobacter jejuni. Journal of Bacteriology 183, 2553–2559. Wall, J. D. (2004). Estimating recombination rates using three site likelihoods. Genetics 167, 1461–1473. Wiuf, C. and Hein, J. (2000). The coalescent with gene conversion. Genetics 155, 451–462. 19 3000 2500 boundary 2000 1500 1000 500 500 1000 1500 2000 2500 3000 boundary Figure 1: Pairwise plot of Linkage Disequilibrium (LD; as measure by D 0 below the diagonal and the Likelihood Ratio statistic for linkage equilibrium above the diagonal). The MLST data was concatanated (in the order the genes appear on the genome: asp, gln, glt, pgm, unc, gly and tkt), and the boundary of each gene is marked by a black line. Each box represents a pair of segregating sites, the boundary of the boxes are equidistant between neighbouring segregating sites; and the colour of the box shows the amount of LD (ranging from red - high LD to white - low LD) between the two sites. Only sites with minor allele frequency greater the 10% are included in the plot. Recombination Rate per kb 20 15 10 5 0 50 100 150 Tract Length 500 1000 1500 2000 Frequency 0 50 100 150 200 250 Frequency Figure 2: Histogram of mles of the tract length (left) and the recombination rate (right) from 1,000 simulated sample. The true values were 500bp and 5 per kb respectively.
© Copyright 2025 Paperzz