sequence or genome to identify differences hqSNP analysis

hqSNP, wgMLST and the WGS
alphabet soup: what
epidemiologists need to know
Martin Wiedmann
Cornell University
E-mail: [email protected]
Outline
• Review of genomes, genes, and evolution
• Use of sequence data to assess relatedness of
organisms
• Data analysis approaches
• wgMLST and hqSNP
• Trees and how to interpret them
What is a SNP?
• Single Nucleotide Polymorphism (SNP)
ATGTTCCTC sequence
ATGTTGCTC reference
*phylogenetically informative differences
• Insertion or Deletion (Indel)
ATGTTCCCTC sequence
ATGTTC-CTC reference
*differences not used in hqSNP analysis
Microbial evolution 101 –
mechanisms of change
Point mutations
ACCCTCTAGTAGTAGCA
ACCATCTAGTAGTAGCA
ACCCTCTAGTAGTAGCA
1 SNP and one “genetic event”
4
Microbial evolution 101 –
mechanisms of change
Insertion or deletion
ACCCTCTAGTAGTAGCA
ACCATCTAG . . . TAGCA
ACCCTCTAGTAGTAGTAGCA
3 differences (?) and one “genetic event”
5
Microbial evolution 101 –
mechanisms of change
Inversion
ACCCTCTAGTAGTAGCA
6
ACCATCTCGTAGTAGCA
ACCCTCTAGTAGTAGCA
Alignment:
ACCATCTCGTAGTAGCA
ACCCTCTAGTAGTAGCA
2 SNPs and one “genetic event”
Microbial evolution 101 –
mechanisms of change
Horizontal gene transfer of homologous gene sequences
ACCCTCTAGTACTAGCATCC
TCCCTCTTGTCCTACCATCA
CTTGTCCTACCA
CTTGTCCTACCA
ACCCTCTAGTACTAGCATCC
ACCCTCTTGTCCTACCATCC
7
Alignment:
ACCCTCTAGTACTAGCATCC
ACCCTCTTGTCCTACCATCC
3 SNPs and 1 genetic event
Microbial evolution 101 –
mechanisms of change
8
Transformation
Transduction
Case study – why does it matter
• Human listeriosis outbreak in 2000 with 29 cases
• Isolates show 1 SNP differences to food and human isolate
from a single case linked to processing facility X in 1988
• Epidemiology support that this facility was the source of the outbreak
• Some analyses approaches that did not account for
recombination would have shown that human isolates from
2000 show approx. 3,000 SNP differences to 1988 food
isolate from facility X
• Why: Large recombination event that introduces a large prophage
(viruses inserted into the bacterial genome)
Outline
• Review of genomes, genes, and evolution
• Use of sequence data to assess relatedness of
organisms
• Data analysis approaches
• kSNP, wgMLST, and hqSNP
• Trees and how to interpret them
Use of sequence data to assess
relatedness of organisms
• Differences in sequences can be used to assess
relatedness of organisms and the likelihood of recent
common ancestor
• “Do the M. tuberculosis isolates from patient A and patient B
share recent common ancestor”
• Definition of “recent” becomes important – recent in years or
generation times
• Salmonella in a dry processing plant may stay dormant and rarely if
ever multiply (or imagine anthrax spores in soil)
• Salmonella in a chicken flock may multiply every 30 min (>7,500
times a year)
• Assessing relationships of microbial isolates typically
requires more information than just sequence data
• Information on epidemiological relationships and other relevant data is
essential
Outline
• Review of genomes, genes, and evolution
• Use of sequence data to assess relatedness of
organisms
• Data analysis approaches
• kSNP, wgMLST, hqSNP, and others
• Trees and how to interpret them
Basics of WGS Analyses
• Different ways to compare the genomes of 2 different
isolates
• Compare the genome small piece-by-small piece to find
pieces that are different
• Kmer based analyses
• Use a high quality (reference) sequence or genome to
identify differences
• hqSNP analysis
• Compare genomes on a gene-by-gene (locus-by-locus)
basis
• wgMLST analysis
• All these analysis can provide an output that provides the
“number of differences” or can be sued to build trees
Basics of WGS Analyses
• Different ways to compare the genomes of 2 different
isolates
• Compare the genome small piece-by-small piece to find
pieces that are different
• Kmer based analyses
• Use a high quality (reference) sequence or genome
to identify differences
• hqSNP analysis
• Compare genomes on a gene-by-gene (locus-by-locus)
basis
• wgMLST analysis
• All these analysis can provide an output that provides the
“number of differences” or can be sued to build trees
What makes a SNP high quality
(hq)?
Sequence
Reads
Sequence
reads
Sequence
reads
Quality filtered Sequence Reads
ready for analysis
Apply a quality filter that filters
out nucleotides in sequence
reads for comparison based on
sequence coverage and quality
The alphabet soup of analysis –
Coverage
Coverage at 40x
Coverage at 5x
http://missusrousselee.deviantart.com/art/AlphabetSoup-134724659
• NGS generates 100,000 or more reads per
one genome sequenced
• Any single location on the genome can have
zero to hundreds of sequence reads that
cover the one region
What to call a SNP
• SNPs called based on:
• Quality
• Coverage
• Base frequency
• The differences
between the reference
and compared genome
are extracted and used
to determine
relatedness
ATGTTACTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTTCTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTGCTC
ATGTTCCTC
ATGTTCCTC
ATGTTCCTC
ATGTTGCTC reference
Is it a SNP?
Where to call a SNP?
• Not all SNP pipelines are equal – where you call
SNPs will affect the total SNP count
• SNPs relevant for phylogenetic analysis are
vertically transmitted, not horizontally, so horizontal
genetic elements like phages can be masked
Mobile
elements
genes
Raw reads
Mask mobile elements
-do no consider SNPs in this location
Only call SNPs in genes
How to report SNP data – keep it simple
• Hi folks:
New Cluster: 2016039
Two isolates are 0 SNPs from each other:
E2017003216 (SE77B52)
E2017003039 (SE77B52)
New Cluster: 2016040
Two isolates are 2 SNPs from each other:
E2017002910 (SE1B1)
I2017003132 (SE1B1)
19
MDH00841
MDH00849
Caveats of hqSNP analyses
Advantages
Disadvantages
When to Use
Phylogenetically
informative (build a tree
consistent with evolution
of the strains)
Requires a closely related
reference genome – hqSNP
analysis is problematic if reference
genome is not closely related
Good for situations where a
wgMLST database has not
been developed and
validated. May provide
highest amount of resolution
for strain comparison
SNP position can be
identified on genome
(gene affected can be
identified)
Takes a while and requires a lot of
computer power
Interpretation of data depends on
genomes added – is not stable
and does not lead to nomenclature
Basics of WGS Analyses
• Different ways to compare the genomes of 2 different
isolates
• Compare the genome small piece-by-small piece to find
pieces that are different
• Kmer based analyses
• Use a high quality (reference) sequence or genome to
identify differences
• hqSNP analysis
• Compare genomes on a gene-by-gene (locus-bylocus) basis
• wgMLST analysis
• All these analysis can provide an output that provides the
“number of differences” or can be used to build trees
Traditional MLST
MLST.NET
• ~6-12 housekeeping genes; usually portion of gene
• Developed in the area of Sanger sequencing, providing for
improve discrimination over sequencing 1 gene
• Targets selected to represent population structure,
not as useful for outbreak detection
• Schemes are available on international publically
accessible databases
• combination of 6-12 genes used to name a unique
sequence type (i.e. MLST profile 1-1-1-1-1-1-1 =
ST1)
Whole genome multilocus
sequence typing (MLST)
• Database is built from gene content representing a diverse
selection of the genus/species of the organism being
compared
• Each unique gene is referred to as a “locus” – a locus may
include the entire gene or a piece of the gene
• Any changes – SNP, insertions, deletions – equals a new
allele call for a locus
• New alleles are named sequentially when encountered- not
based on sequence
Locus 1
ACTAGAGGGAAA
allele 1
2 SNPs
ACTAGAGGCTAA
allele 2
1 indel
ACT-GAGGGAAA
allele 3
Whole genome multilocus
sequence typing (MLST)
• Allows for simpler analysis and clear naming of
subtypes
• Performs comparison on a gene by gene level
Isolate A
Isolate B
Isolate C
Locus 1 (20 nt)
1
1
1
Locus 2 (100nt)
8
8
12
Locus 3 (5000nt)
5
5
2
Locus 2,005 (5nt)
4
4
4
wgMLST type
A
A
B
Etc.
The alphabet soup of analysis wgMLST
• The allele calls at each locus are compared between isolates
and differences are used to determine relatedness
“Allele Code” Pattern Naming in
the Listeria Database
 Pilot thresholds
―
―
―
―
―
―
10% = 300 alleles
5% = 150 alleles
2.5% = 75 alleles
1% = 30 alleles
0.5% = 15 alleles
0.25% = 7 alleles
Two isolate are the same:
Patient 1: 4.1.1.5.2
Patient 2: 4.1.1.5.2
The wgMLST “zip code”
• Two isolate are the same:
• Patient 1: 1.4.1.1.5.2
• Patient 2: 1.4.1.1.5.2
• Three isolates; patient 3 differs by 1 to 7 alleles from 1 and 2
• Patient 1: 1.4.1.1.5.2
• Patient 2: 1.4.1.1.5.2
• Patient 3: 1.4.1.1.5.4
• Four isolates; patient 4 differs by 8 to 15 alleles from the others:
•
•
•
•
31
Patient 1: 1.4.1.1.5.2
Patient 2: 1.4.1.1.5.2
Patient 3: 1.4.1.1.5.4
Patient 4: 1.4.1.1.7.1
How to report wgMLST data – keep it
simple
• Hi folks:
New Cluster: 2016039
Two isolates are 0 alleles from each other:
E2017003216 (SE77B52)
E2017003039 (SE77B52)
New Cluster: 2016040
Two isolates are 2 alleles from each other:
E2017002910 (SE1B1)
I2017003132 (SE1B1)
32
How to report wgMLST data – give me
the ZIP codes
• Looks like we may have a cluster
•
•
•
•
•
33
Patient 1: 1.4.1.1.5.2
Patient 2: 1.4.1.1.5.2
Patient 3: 1.4.1.1.5.4
Patient 4: 1.4.1.1.7.1
Patient 4: 1.4.3.3.1.1
MLST Analysis
• Faster than analyzing SNP differences
• For WGS data, allele calls can be performed
on short reads (“assembly free”) and
assembled genomes (“assembly-based”)
• If there is a conflict between the allele calls then no
allele call is made
34
Advantages and Caveats of wgMLST analysis
Advantages
Disadvantages
When to
Use
Phylogenetically informative
Initial assignment of alleles is
computationally costly (doing assemblies
before calling alleles); CDCs system will
call alleles directly from raw reads (~ 2
min); assemblies take about 2 h or
perhaps longer; if there is a conflict
between the allele calls then no allele call
is made
Surveillance,
especially for a
distributed testing
network
All virulence, serotyping, and antibiotic
resistance genes can be pulled out as part of
analysis
Comparing character data (allele numbers)
rather than genetic data
Reference
characterization
Neutralizes the effects of horizontal gene
transfer (event is only counted once rather
than many times for hqSNPs)
SNPs and indels treated equally
Accurate cluster
detection
Allele calling is stable – data standardizable;
directly comparable between laboratories;
can lead to nomenclature based on allele
calls, which can be used for communication
and automated cluster detection;
reproducibility not dependent on choice of
reference strain; amenable to automated
bioinformatics
Requires curation for allele calls
Need to
communicate with
partners using
stable
nomenclature
hqSNP versus MLST Analysis
• Both analyses conducted from the same raw
data (typically short read sequencing data)
• For public health purposes, both correlate well
• i.e the outermost branches of phylogenetic trees
are almost identical
• The two are not mutually exclusive
• For some use cases MLST works better, others
SNP works better
36
Interpreting analysis data – how to
build trees using WGS analysis
• Use WGS analysis to infer relatedness of isolates
• For wgMLST: translate the number allele difference
between isolates to a measure of similarity and use
that to infer branch lengths and relatedness
• For hqSNP analysis – translate nucleotide differences
between isolates to relatedness
• Can use substitution models to estimate the cost of changing
from A>T, C>A, etc.
Thymine
Cytosine
adenine
guanine
How to report SNP data - trees
1
1
2
2
1
2
3
4
ATATTCCGCAA
ATATTCCGCAA
ATATTGCGCAA
ACCTTGCGCTA
3
3
4
4
2
2
3
3
1
4
38
1
Building the tree
Isolate
Sequence
A
ggagagtta
B
ggatccccc
C
ggattatta
D
actgccggt
ancestor actgaatta
6
Isolate B
1
ggataatta
1 Isolate C
3 ggattatta
ggatccccc
ggagaatta
actgaatta
1 Isolate A
ggagagtta
5
actgccggt
Isolate D
genetic
change
• Use the differences you identified by hqSNP or
wgMLST to infer the relatedness or phylogeny
Reading the trees
Node
Most recent common ancestor
(for isolate B and C)
6
1
Leaf
Taxa
1 Isolate C
3
Ancestral node
Terminal node
Isolate B
1 Isolate A
5
genetic
change
Isolate D
Clade
Outgroup/Root –
related isolate (same
PFGE pattern or 7-gene
MLST) but not part of
outbreak
Trees, branches, and leaves –
more than one way to draw a tree
2012K-1417
2012K-1550
2013K-1635
2012K-1549
20
2012K-1747
2012K-1315
SRR2759138
SRR2759145
N23600
2013K-1649
2013K-1650
2015K-0885
N18382
201
SRR
SRR2759147
20 N3
13 7
K - 91
09 4
83
N1
84
57
N4
469
5
3K-0
573
275
913
74
8
31
1
K04
15
20
420
255
2K
-1
2K-1
421
2K-1
201
K-17
47
201
K-1
20
12
2N0 4
15224
K- 2
04
5
27
59
14
75
6
91
43
12
K-
63 9
3K-1
8
201
3
6
3K-1
201
59
733
N6209 386
20 31
R1 N
SR
17
12
20
N6
9
N1997
7
SRR1206095
54
56
12
K12
33
6220
-16
K
13
20
1
681
49
N4 5K-04
201
42 2
5K-0
15
201
K-13
2012
N
2015K-0422
SR
R
SR
R2
48
2015K-0449
Branches that connect to the terminal
node are the important branch
lengths to indicate relatedness
91
N46811
75
2013K-1633
R2
SR
2012K-1256
201
N17
2012K-1254
N662
2012
N42242
2015K-0451
2013K-163
2012K-1417
2012K-1550
50
2015K-0431
04
K-
2012K-1255
2012K-1748
5
2012K1549
15
Many different ways to display trees
2012K-1421
2012K-1420
SRR1206097
59138
SRR27
N19978
SRR1206088
SRR2759149
SRR1206085
2013K-0982
9145
SRR275
2013K-1636
2015K-0430
SRR1206090
2015K-0886
1307MNGX
1307MNGX6-1
6-1
1307MNGX
1307MNGX6-1
6-1
2015K-0421
N23600
1307MNGX
1307MNGX6-1
6-1
SRR1206091
2013K-1649
SRR1206094
1307MNGX
1307MNGX6-1
6-1 1307MNGX
1307MNGX6-1
6-1
SRR2759142
1307MNGX
1307MNGX6-1
6-1
5.00
5.00
1307MNGX
1307MNGX6-1
6-16.00
6.00
4.00
4.00
1307MNGX
1307MNGX6-1
6-1
-0445
2015K
24
K-04
2015
275
3K-1
201
574
3K-0
201
1.00
1.00
21382
1312MLGX
1312MLGX6-1
6-1
1307MNGX
1307MNGX6-1
6-1
45.00
45.00
20.00
20.00
2015K-0447
1312MLGX
1312MLGX6-1
6-1
47.00
47.00
SRR2759150
00
SRR1206087
SRR1206096
N2
169.00
169.00
60.00
60.00
2
1312MLGX
1312MLGX6-1NOT
6-1NOT
3
01
1312MLGX
1312MLGX6-1
6-1
110.00
110.00
1312MLGX
1312MLGX6-1NOT
6-1NOT
2013K-0979
1312MLGX
1312MLGX6-1
6-1
20
1312MLGX
1312MLGX6-1
6-1
2015K-0424
20
5.00
5.00
25.00
25.00
58.00
58.00
SRR2759142
1312MLGX
1312MLGX6-1
6-1
SRR1206094
SRR1206090
SRR1206085
1312MLGX
1312MLGX6-1
6-1
N19978
N19977
SRR2759139
31.00
31.00
2013K-0573
N44695
N18457
2013K-0983
1312MLGX
1312MLGX6-1
6-1
N37914
SRR2759143
SRR2759146
SRR2759148
2015K-0450
2012K-1550
2012K-1417
2012K-1421
2013K-1635
2012K-1549
2012K-1420
2012K-1747
2012K-1255
N42242
2015K-0431
2015K-0451
2012K-1748
N17
2012K-1256
2013K-1633
2012K-1254
N662
2012K-1315
N46811
2015K-0449
SRR2759145
SRR2759138
2015K-0422
2013K-1649
2013K-1650
N23600
2015K-0885
SRR1206097
SRR1206095
N18382
SRR1206088
SRR2759147
SRR2759149
2013K-0982
2015K-0886
2013K-1636
2015K-0430
N41751
SRR1206091
2015K-0421
N43452
2015K-0432
2015K-0447
SRR2759152
2015K-0423
SRR2759150
SRR2759140
2013K-1274
SRR1206087
SRR1206096
N46812
2013K-0980
SRR2759144
2015K-0448
2013AM-0303
N28605
2013K-0979
2015K-0446
N20030
SRR1206092
SRR1206086
2013K-0574
2013K-1275
SRR2759142
2015K-0429
2015K-0445
2015K-0424
SRR1206085
SRR1206094
SRR1206090
SRR2759139
N19977
2013K-0573
N19978
N44695
2015K-0450
N13150
21382
2013K-1361
21383
32452
N37914
N18457
SRR2759143
SRR2759146
SRR2759148
2013K-1638
2013K-1639
2013K-0983
SRR1206093
N31386
N27359
2013K-1638
N27359
2013K-1639
SRR1206093
N31386
N13150
1312MLGX
1312MLGX6-1
6-1
2015K-0445
2015K-0429
21382
1312MLGX
1312MLGX
6-1
6-1
1312MLGX
1312MLGX
6-1
6-1
N20030
2013K-0574
2013K-1275
2
52.00
52.00
345
48.00
48.00
46.00
46.00
53.00
53.00
SRR1206092
N4
SRR2759144
SRR1206086
60
52
324 83
213
2013AM-0303
2015K-0446
92
20
86
60
20
6
R1
44
-0
SR
K
15
20
R1
SR
2015K-0448
1
21
09
04
K- 206
15 1
20 SRR
1
6175 32
13 1
K- N4 K-04
5
13
20 201
2013K-0980
N28605
30
79
09
K-
1312MLGX
1312MLGX6-1
6-1
N46812
SRR2759152
SRR2759140
2013K-1274
SRR2759152
2015K-0423
2013K-1274
27
AM 591
4
-0
30 4
3
N13150
N18382
SRR
2759
147
SRR
120
609
5
SR
R12
060
SR
97
R1
206
SR
088
R2
75
20
91
13
49
K20
09
82
20 13K
-1
15
K- 636
04
30
20
15
K08
86
29
2015K-04
2.00
2.00
1307MNGX
1307MNGX6-1
6-1
21383
SR
R
32452
2015K-0423
N43452
3.00
3.00
15 N
K- 2
20
04 86
13
K-0 48 05
98
0
SR
R12
0
6
SRR
N4096
120
608 6812
SRR
7
2759
150
SRR275
9140
2015K-04
47
45.00
45.00
2015K-0432
2013K-1650
2015K-08
85
20.00
20.00
16.00
16.00
13
N41751
2013K-1361
Trees, branches and leaves –
reading the trees
• Difference between
similarity and
relatedness on the
tree
• Isolate A and C are
more similar to each
other than C and B are
• Isolate C and B are
more related to each
other than C and A are
6
Isolate B
1
ggataatta
1 Isolate C
3 ggattatta
ggatccccc
ggagaatta
actgaatta
1 Isolate A
ggagagtta
5
actgccggt
genetic
change
Isolate D
Trees, branches and leaves – what does
it mean for my outbreak investigation
• Epidemiologic data provides context to the tree
– cannot rely on phylogenetic tree to identify
outbreak source
5
ggatccccc
1
ggataatta
1
3 ggattattaStool
ggagaatta
1
actgaatta
kale
ggagagtta
5
actgccggt
genetic
change
stool
spinach
wgMLST–based phylogenetic
Tree
1307MNGX
1307MNGX6-1
6-1
1307MNGX
1307MNGX6-1
6-1
1307MNGX
1307MNGX6-1
6-1 1307MNGX
1307MNGX6-1
6-1
1307MNGX
1307MNGX6-1
6-1
20.00
20.00
3.00
3.00
• Minimum spanning
tree (MST)
Crave Brothers
16.00
16.00
1307MNGX
1307MNGX6-1
6-1
45.00
45.00
5.00
5.00
1307MNGX
1307MNGX6-1
6-16.00
6.00
2.00
2.00
1307MNGX
1307MNGX6-1
6-1
4.00
4.00
1307MNGX
1307MNGX6-1
6-1
1.00
1.00
1312MLGX
1312MLGX6-1
6-1
1307MNGX
1307MNGX6-1
6-1
45.00
45.00
20.00
20.00
1312MLGX
1312MLGX6-1
6-1
47.00
47.00
169.00
169.00
60.00
60.00
1312MLGX
1312MLGX6-1
6-1
1312MLGX
1312MLGX6-1NOT
6-1NOT
1312MLGX
1312MLGX6-1
6-1
110.00
110.00
48.00
48.00
46.00
46.00
53.00
53.00
52.00
52.00
New subgroup
1312MLGX
1312MLGX
1312MLGX
1312MLGX
6-1
6-1
6-1
6-1
1312MLGX
1312MLGX6-1
6-1
kale
5.00
5.00
1312MLGX
1312MLGX6-1
6-1
25.00
25.00
1312MLGX
1312MLGX6-1
6-1
58.00
58.00
1312MLGX
1312MLGX6-1
6-1
1312MLGX
1312MLGX6-1
6-1
31.00
31.00
1312MLGX
1312MLGX6-1
6-1
1312MLGX
1312MLGX6-1NOT
6-1NOT
• Unrooted
• Depicts genomes in
a network and
branch lengths show
relatedness of
isolates (number of
allele differences)
MDH00215 -Sporadic 4/19/01
MDH00247 --Sporadic 8/6/12
MDH00204 - Sporadic 5/14/01
MDH00221- Sporadic 5/14/01
MDH00203 - Sporadic 7/11/00
MDH00214 - Sporadic 3/12/01
MDH00206 - Sporadic 8/23/00
MDH00217 - Sporadic 6/10/13
MDH00248 - Sporadic 6/10/13
MDH00237 Sporadic 6/22/11
MDH00236 - Sporadic 5/7/11
MDH00207 - Sporadic 8/31/2000
MDH00233 - Sporadic 12/7/2001
MDH00205 - Sporadic 8/22/2000
MDH00216 - Sporadic 4/30/2001
MDH00224 -Sporadic 6/11/2001
MDH00254
MDH00252
MDH00253
MDH00234
MDH00226 - Sporadic 6/21/2001
MDH00231 - Sporadic 7/16/2001
MDH00202 - Sporadic 7/7/2000
MDH00208 - Sporadic- Same time, PFGE, and MLVA as Outbreak 1
MDH00209
MDH00210
MDH00211
0-2 SNPs
Defined Outbreak
Samples
Outbreak 1- Sept 2000
Outbreak 2- May 2001
Outbreak 3- Aug 2001
Outbreak 4- Nov 2003
Outbreak 5- Aug 2008
Outbreak 6- Spring 2014
Outbreak 7- Spring 2014
Taylor et al. J Clin Micro Oct 2015.
MDH00222- In-vivo, same as E2001001070
MDH00223
MDH00219
MDH00225- In-vivo, same as E2001001070
MDH00228- In-vivo, same as E2001001070
MDH00220
MDH00218
0-2 SNPs
0-2 SNPs
MDH00213- Sporadic- Same PFGE and time as Outbreak 1
MDH00232- Sporadic 10/17/01
MDH00227
MDH00230
MDH00251
MDH00229
MDH00235- Sporadic 10/3/05
MDH00243- Sporadic, same PFGE and time as Outbreak 5
MDH00245- Sporadic 6/26/12
MDH00249
MDH00250
MDH00246-Sporadic 7/30/12
MDH00255- OH Sample 1
MDH00256- OH Sample 2
MDH00241- Sporadic, same PFGE and time as Outbreak 5
MDH00239
MDH00242
MDH00244- Environmental sample from Outbreak 5
MDH00238
MDH00240
0-1 SNPs
0 SNPs
1SNP
0-3 SNPs
Take Home Messages
• Molecular epidemiology requires collaborations
between epidemiologists and the lab
• Microbial isolates can accumulate genetic
differences through a variety of mechanisms
(e.g., horizontal gene transfer)
• The approach data analyses use to deal or not deal
with these different evolutionary mechanisms can
play an important role
• hqSNP and wgMLST both address and account for
horizontal gene transfer, but in different ways
• Different organisms differ in their lifestyles and
mechanisms of evolution
• Need to know your epi and your bugs
47
Acknowledgments
• Centers for Disease Control and Prevention
•
•
•
•
Heather Carleton
Greg Armstrong
Peter Gerner-Smidt
John Besser
• Integrated Food Safety Centers of Excellence
48
Questions
49