Genomic epidemiology of Campylobacter jejuni

GENOMIC EPIDEMIOLOGY
OF CAMPYLOBACTER JEJUNI
Associate Professor DVM
Mirko Rossi,
University of Helsinki
DISCLAIMER
Some data presented are part of the INNUENDO project which has received funding
from European Food Safety Authority (EFSA), grant agreement
GP/EFSA/AFSCO/2015/01/CT2 (New approaches in identifying and
characterizing microbial and chemical hazards). The conclusions, findings, and
opinions expressed in this presentation reflect only the view of the authors and
not the official position of the EFSA.
CHICKEN IS THE MAIN RESERVOIR
65.4%
total
~90 /105
~3% Jun/Sep
25.6%
9%
~20 /105
domestic
Source attribution of Finnish human cases is ≈ other countries
Chicken and Ruminants (Bovine) as MAIN RESERVOIRS
(De Hann et al., 2012)
IS CHICKEN MEAT DIRECT SOURCE OF INFECTIONS
OF DOMESTIC CASES?
Decrease sero-genotype association
between Human/Chicken when
accounting for temporal clustering
Kärenlampi 2003 JCM
Temporary shift in poultry vs human for
the two dominant C. jejuni STs Kovanen
2016 IJFM
Peak in human week 26 (July); peak in
poultry August Llarena 2014 PLoSONE
 Do Human and poultry share a
common source?
Finnish domestic cases and poultry isolates from 2012
FROM SOURCE ATTRIBUTION TO SOURCE
TRACKING
EFSA Panel on Biological Hazards (BIOHAZ), 2010:
 Chicken as reservoir might account up to 80% of the human cases
 Broiler meat as direct source might account for only 20-30%  50-60% unknown transmission routes
In Norway, contemporary space-time clusters of Campylobacter spp. in human and in
broiler  shared risk factors Jonsson 2010 IJHG
 interventions targeting those common factors would be more effective in
reducing Campylobacter infections
FROM SOURCE ATTRIBUTION TO SOURCE
TRACKING
Spatial-temporal clusters of apparently sporadic cases  diffuse outbreaks are
probable common
 Norway: up to 19.6% of the human (maximum radius of 50 km and maximum time of 30 days) Jonsson
2010 IJHG
 UK: ~13% of the human cases (lasting up to 50 days); 5 times more common than general or
household outbreaks Strachan&Forbes 2014 Gabriel 2010 EAI
WHAT ARE THE MOST LIKELY SOURCE OF THESE CLUSTERS?
 Integration of genome sequencing in surveillance for detection of
epidemiologically related cases occurring in a less demographically distinct group
Fernandes 2015 CID
GENOMIC DIVERSITY OF
CAMPYLOBACTER JEJUNI
Genomic diversity within an
outbreak & Epidemic and global
dispersion
GENOMIC DIVERSITY WITHIN AN OUTBREAK
1-Genomic variation after a single human
passage
Genetic heterogeneity of Campylobacter jejuni NCTC 11168 upon human
infection Revez 2013 IGE
2-Genomic in point-source Campylobacter
outbreaks
Genomic Variation between Campylobacter jejuni Isolates Associated with
Milk-Borne-Disease Outbreaks Revez 2014 JCM
Genome analysis of Campylobacter jejuni strains isolated from a
waterborne outbreak Revez 2014 BMC genomics
Refinement of whole-genome multilocus sequence typing analysis by
addressing gene paralogy Zhang 2015 JCM
EPIDEMIC AND GLOBAL DISPERSION
3-Discover diffuse Campylobacter outbreak using
genomics
Multilocus Sequence Typing (MLST) and Whole-Genome MLST of Campylobacter jejuni
Isolates from Human Infections in Three Districts during a Seasonal Peak in Finland
Kovanen 2014 JCM
Tracing isolates from domestic human Campylobacter jejuni infections to chicken
slaughter batches and swimming water using whole-genome multilocus sequence
typing Kovanen 2016 IJFM
4-Genome phylogeography of C. jejuni and its
implication in epidemiology
Monomorphic genotypes within a generalist lineage of Campylobacter jejuni show
signs of global dispersion Llarena 2016 MGen
GENETIC VARIATION AFTER HUMAN PASSAGE
Revez 2013 IGE
11168 - p < 0.01
Single deletion in a single gene
Frequencies of in-frame status of contingency genes:
 Cj1139c b-1,3 galactosyltransferase
 Cj0045c Iron-binding protein
 Cj1145c & Cj0456c Hypothetical proteins
Confirmed by independent research on 11168
(although different sets of genes) – deletions in two
loci Thomas 2014 PLoSONE
Van Amsterdam 2006 FEMS
STANDING GENETIC VARIATION IN CONTINGENCY
LOCI AND RAPID ADAPTATION
Homopolymeric runs  rapid SNV during population growth in the absence of
selection (rate ~10- to 100-fold faster) Bayliss 2012 NAR
Bottleneck effect and selection responsible for the differences observed in vivo (Host
specificity Human ≠ Mice ≠ Chicken)
E.g. Expression of specific LOS structures  immune evasion (variation of Cj1139c
GM1/GM2)
 Baseline for variation within outbreak (1 – 2 SNV + indels in homopolymeric runs)
GENOMIC IN THE MILK-BORNE
CAMPYLOBACTER ST-50 OUTBREAK 2002
All PubMLST alleles for 1,738 loci
1,432 shared loci
8th Jan
Costumized BLAST+ script  wgMLST
(extract all the shared loci with allele
information in all the samples + new alleles)
Intra-outbreak  up to 12 allele differences
affecting 15 genes
Revez 2014 JCM
18th Feb
Finnish hen
16th Dec
Pe =patient
Ma = milk
Le = bovine
80% of the differences = homopolymeric
runs (only 3 SNVs, Po_1 and Le_204R)
 ~250 allele diversity with a ST-50 a
separate milk-borne outbreak > 10 years
apart
1,404 shared loci
Outbreak strains
UK
GENOMIC OF THE WATERBORNE CAMPYLOBACTER
OUTBREAK (2000)
Revez 2014 BMCg
Manually filtered isolated SNVs  3
and 69
Using BIGSdb to retrieve allele profile
for all  1,287 match in DB:
 8 – 23 allele differences
Difference with strains isolated 12 years
apart having same PFGE profile = 64
SNVs (~8 allele differences)
Phylogenomics and genealogy revealed
two strains circulating in the outbreak
Av.d. 0.0175 (~23 alleles dif)
Av.d. 0.0061 (~8 alleles dif)
IHV116260 vs 4031  3 SNVs
IHV116292 vs 4031  69 SNVs
6236/12 vs 4031  64 SNVs
GENOMIC IN POINT-SOURCE CAMPYLOBACTER
OUTBREAKS
Zhang 2015 JCM
1 -7 Allele differences when using
wgMLST with ~1,200 – 1,500 shared loci
Poultry Farm epi-linked strains
Allelic diversity
551
550
552
62
62
59
57
62
61
7
80
54
52
54
126
126
126
125
4
6
559
560
559
559
566
565
552
567
555
554
555
570
570
570
570
4
558
559
558
558
565
564
551
566
554
553
554
569
569
569
569
560
561
560
560
567
566
553
568
556
555
556
571
571
571
571
6
33
32
51
52
64
96
33
31
32
140
139
139
138
33
33
54
53
64
96
34
31
35
137
136
136
137
5
50
48
60
93
29
28
30
135
134
134
134
46
47
61
93
29
29
29
134
134
134
133
3
66
98
43
42
42
139
139
139
138
66
97
43
42
44
137
137
137
136
85
55
53
55
129
128
128
128
Waterborne outbreak
88
86
88
135
134
134
133
5
3
132
131
131
130
4
129
128
128
129
133
132
132
131
1
1
2
0
1
1
DISCOVER DIFFUSE CAMPYLOBACTER OUTBREAK
USING GENOMICS
wgMLST
Human domestic cases 2012 from three
hospital districts
Whole genome analysis using ~70% of total
loci + BIGSdb  clusters of 1-8 allele
differences
Genetically closely related isolates within the
STs  possible diffuse outbreak??
1,121
1,264
Genealogy revealed
 genetic diversity within ~identical wgMLST types
(e.g. cluster 1 ST-45)
 Higher similarity in pairs with high allele diversity
(e.g. cluster 2 ST-45)
Genealogy (ClonalFrame)
Kovanen 2014 JCM
TRACING SOURCE OF POSSIBLE DIFFUSE
CAMPYLOBACTER OUTBREAK
Kovanen 2016 IJFM
Hierarchical approach  discover more
diversity (Genome Profiler Zhang 2015
JCM)
Coupuled with temporal clustering
GENOMIC ANATOMY OF CAMPYLOBACTER
OUTBREAKS
Changes in homopolymeric tructs were the main differences within epi-linked
strains
 1 -12 Allele differences ~ 3 SNVs
Fernandes 2015 CID In British milk-borne outbreak  mean of 4 allele/1577
loci:
 equivalent to the differences seen between 2 isolates from the same patient
 do not appear to occur among isolates with no epidemiological relationship Cody 2013 JCM
Diffuse Campylobacter outbreaks are possible, but unknown sources
Missing information on baseline genomic diversity of C. jejuni population;
how much that affect epidemiology?
MONOMORPHIC GENOTYPES WITHIN
CAMPYLOBACTER CLONAL COMPLEX
Spatial temporal evolution of ST-45
clonal complex
Llarena 2016 MGen
Genealogy reconstruction of ST-45CC
 UK, FINLAND and BALTIC countries
 From 1999 to 2013
Little or no spatial clustering – no
temporal clustering
Little genetic diversity over time and
space for certain populations (green and
violet; arrows)
MLST typing  not always correlated
with genealogy
Country red=UK blue=Finland
Year of isolation (different colour different year)
Different colour of the branches = different BAPS populations
GLOBAL DISPERSION OF MONOMORPHIC
GENOTYPES
Llarena 2016 MGen
Lack of genetic isolation by distance
sampling date had a weak correlation
with the root-to-tip distance
Overall genetic diversity in 12 years ~
genetic diversity in a single year
Sign of global dispersion
geese
Canada 2011
Italy 2015
Predicted TMRCA
using Wilson 2009
mutation rate NOT
FITTING WITH
SAMPLING DATES
IMPLICATION FOR EPIDEMIOLOGY
Llarena 2016 MGen
Separation of clustered and sporadic
cases based only on genomic diversity is
not possible
Dominant circulating clone  simulates
diffuse outbreaks
Different STs have different ”history” 
difficult to predict species wide dynamics
Unknown reason of expansion of these
clones  neutral evolution in the form of
mild purifying selection vs sporadic
selection
This is actually
the BAPS 6
monomorphic
clones
Kovanen 2014 JCM
http://biorxiv.org/content/early/2016/10/01/078550
You can find everything
about my presentation in
a recent review available
in pre-print
WG/CG-MLST
Surveillance and outbreak
investigation
REFINEMENT WG-MLST BY ADDRESSING GENE
PARALOGY
Easy to use local ad hoc analysis
GeP address paralogy with Conserved
Gene Neighborhoods
Zhang 2015 JCM
I. Failing to choose
orthologous from paralogous
IV. Including ambiguities in
allele definition
II. Assegning missing locus as
allele
V. Alleles composed by
overlaping loci
III. Missing allele due to high
seq diversity
VI. Exclude homopolimeric
runs
REFINEMENT WG-MLST BY ADDRESSING GENE
PARALOGY
https://sourceforge.net/projects/genom
eprofiler/ (~300 download worldwide)
GeP is precise but quite slow especially
for highly divergent strains
It is designed for ad hoc wgMLST
analysis  useful for extracting core
alignment and for small analyses
Independently of the analysis  the
topologies of the split graph are very
similar
Zhang 2015 JCM
INNUENDO: A STANDARDIZED CROSSSECTORIAL FRAMEWORK
Data collection
Data analysis
A portable platform for automatic realtime application of wg/cgMLST in a
public health settings
A predictable model for forecasting
epidemiological relationship between
isolates
Place project stakeholders at the center
of the platform design and development
Prof-of-concept
Platform
development
Campylobacter, Yersinia, STEC, Salmonella
PROJECT WORKFLOW
A. Quality assurance ( the product is the ”correct” cgMLST
profile)
I.
II.
Define the general measures and write pipeline (WP3)
Define the species-specific cut-off values effect on allele calling (WP2)
B. Calibration – known samples
i.
ii.
cgMLST schema developing/selection
Cut-off for epi-linked strains
C. Validation– unknown samples
i.
ii.
Resolution of cgMLST ( when we need ad hoc schemas?)
Inferring transmission patterns
QUALITY ASSURANCE - INNUCA
1. INNUca v1  theoretical
coverage + assembly statistics
INNUca v1.0
2. INNUca v2 species-specific:
A.
B.
C.
“True” Coverage estimation using
reference genes (ReMatCh)
Probability of identify no-axenic
samples
Affect of coverage on allele calling
 defining cut-off
https://github.com/INNUENDOCON/INNUca
INNUENDO quality control of reads, de novo assembly and
contigs quality assessment, and possible contamination search
Cov. Est + Cont
ReMatCh
INNUca v2.0
Assem correction
Pilon
https://github.com/mickaelsilva/chewBBACA
THE ALLELE CALLING ENGINE: CHEWBBACA
Twitter: @jacarrico
University of Lisbon
"Comprehensive and Highly Efficient Workflow“ BSR-Based Allele Calling Algorithm
Faster then every other allele calling
Portable (does not need a lot of computing)
Allele call outputs





LOT – Locus on the tip
PLOT- Possible LOT
Size threshold selection – Choosing allele length mode +/- 20%
More than one match with BSR > 0.6 – Non Informative Paralogous Locus
ASM /ALM – Allele smaller than Mode / Allele larger than Mode
DEFINING THE CORRECT SCHEMA,
NOMENCLATURE AND WORKFLOW
C. coli/C. jejuni v1 from PubMLST  problem with the definition of core genomes
 build a novel C. jejuni cgMLST schema  validating each loci
 test different ”small” schemas = fractions of the core genomes ( collabotation with Ed Taboada)
Hierarchical WGS typing (inclusion of ad hoc analysis)  2 phases of the analysis:
1.
2.
surveillance (which can be based even on a sort of small cgMLST)  link to nomenclature
outbreak investigation which is based on ad hoc wgMLST
Cut-off definition for
population structure
Cut-off for epilinked as % of
shared genes
cgMLST schema or small-cgMLST
Ad hoc wgMLST analysis for outbreak investigation
Acknowledgment
UniHelsinki
Prof. Emerita
Marja-Liisa Hänninen
The INNUENDO consortium
www.innuendoweb.org
Funding agencies:
Academy of Finland
EFSA
University of Helsinki
Walter Ehrstömin Säätiö
Ministry of Agriculture and
Forestry, Finland
ERA-NET
University of Lisbon (PT)
Dr. Joao Carrico
Prof. Jukka Corander
PhD student Miguel Machado
Dr. Ann-Katrin llarena
PhD student Bruno Goncalves
Dr. Joana Revez (now ECDC)
PhD student Mickael Silva
Dr. Ji Zhang (now Massey University, NZ)
University of Basque Countries (ES)
Dr. Schott Thomas (now Rostock University)
Prof. Javier Garazair
Dr. Rauni Kivistö
Dr. Astrid Skarp (now Hogeschool Rotterdam) Dr. Joseba Bikandi
University of Veterinary Medicine
PhD student Sara Kovanen
(Austria)
Dr. Vehkala M., Dr. Välimäki N.
Prof. Friederike Hilbert
EVIRA
IZS Teramo (Italy)
Dr. Marjaana Hakkinen
Dr. Elisabetta Giannatale, Dr. Giuliano
THL
Garfolo, Dr. Cesare Gammà
Dr. Saara Salmenllinna
Dr. Jani Halkilahti
UniTartu (Estonia)
INSA
For providing the human strains
Prof. Mati Roasto, PhD student Mäesaar M.
Dr. Kärkkäinen U.M.
Dr. Monica Oleastro
Dr. Tuuminen T.
Dr.
Vitor
Borges
Dr. Uksila J.
Public Health Agency of Canada
Prof. Rautelin H (Uppsala University)
Eduardo Taboada, Dillon Barker
Thanks for your attention