Genome-based identification of bacterial pathogens

GENOME-BASED IDENTIFICATION
OF BACTERIAL PATHOGENS
Michel DRANCOURT, MD, PhD
Unité de Recherche sur les Maladies
Infectieuses et Tropicales Emergentes
Marseille Medical School, Marseille, France
http://www.mediterranee-infection.com/
The diagnosis of infectious diseases
Nature Rev Microbiol. 2004;2:151-9.
Accurate Identification of Bacteria
• Accurate Diagnostic of the Disease
• Prognosis
• Treatment
• Secondary prophylaxis
• Further investigations
Traditional Identification
• Isolation and culture
• Culture characteristics
• Gram staining
• Simple biochemical tests : catalase, oxydase,
urease
• Biochemical profiling - Enzyme activity profile
- Auxanogram
When do we use any molecular
identification tool ?
Uncultured / fastidious bacteria
Phenotypically inert bacteria
New bacterial species
Unusual bacteria
Usual bacteria from unusual clinical specimen /
situation
Annual numbers of sequenced genomes
until October 15th, 2012
1400
1274
1200
1000
830
800
600
400
326
226
200
139
0
13
25
29
48
60
172
177
77
5
4
4
2
2
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Burden of bacterial isolates requiring
molecular identification
Bacteria
No. of
isolates
tested
No. (%) of
isolates identified
by 16S rDNA analysis
No. of rare
or unique
isolatesa
Gram-positive cocci
Gram-positive rodsb
Gram-negative bacteria
Enteric bacteria
Other bacteria
Anaerobic bacteria
75,537
16,487
300 (0.40)
524 (3.18)
6
86
51,177
26,357
5,780
132 (0.26)
225 (0.85)
223 (3.86)
3
14
11
Total
175,338
1,404 (0.80)
120
a
Isolates from human samples that have been reported 0 to 10 times.
b Including Mycobacterium spp.
[Drancourt M. et al. J Clin Microbiol. 2004;42:2197]
Molecular Identification of Bacteria
• DNA Sequencing
• DNA Hybridization
• Real-time PCR, Sybrgreen
• Real-time PCR, probe
• Mass-spectrometry
Identification of Bacteria by DNA
Sequencing = Flow chart
Isolate
Sequencing
Identification
Phylogenetic analysis
DNA extraction
Target PCRamplification
Sequence analysis /
Comparison againt databases
DNA extraction =
a wide range of protocols
• No extraction: thermocycler will do the job !
• Cell-wall lysis : * Mechanical - Heat
* Enzymatic
- Glass beads
- Proteinase K
- Lysosyme
• Cell-wall lysis plus DNA extraction
DNA extraction :
a wide range of protocols
• Manual protocol
• Semi-automatic protocol
• Automatic protocol
DNA Sequencing
Capillary sequencer
Pyrosequencer
Mass-spectrometer
High-throughput pyrosequencer
Perform 16S rDNA based identification of
isolates using BLAST
>Mabscessus
TAGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGA
AAGGCCCTTCGGGGTACTCGAGTGGCGAACGGGTGAGTAACACGTGGGTGATCTGCCCTGCACT
CTGGGATAAGCCTGGGAAACTGGGTCTAATACCGGATAGGACCACACACTTCATGGTGAGTGGT
GCAAAGCTTTTGCGGTGTGGGATGAGCCCGCGGCCTATCAGCTTGTTGGTGGGGTAATGGCCCA
CCAAGGCGACGACGGGTAGCCGGCCTGAGAGGGTGACCGGCCACACTGGGACTGAGATACGGCC
CAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCGAC
GCCGCGTGAGGGATGACGGCCTTCGGGTTGTAAACCTCTTTCAGTAGGGACGAAGCGAAAGTGA
CGGTACCTACAGAAGAAGGACCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTCCG
AGCGTTGTCCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTTTGTCGCGTTGTTCGTGAAAA
CTCACAGCTTAACTGTGGGCGTGCGGGCGATACGGGCAGACTAGAGTACTGCAGGGGAGACTGG
AATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAGGAACACCGGTGGCGAAGGCGGGTCT
CTGGGCAGTAACTGACGCTGAGGAGCGAAAGCGTGGGTAGCGAACAGGATTAGATACCCTGGTA
GTCCACGCCGTAAACGGTGGGTACTAGGTGTGGGTTTCCTTCCTTGGGATCCGTGCCGTAGCTA
ACGCATTAAGTACCCCGCCTGGGGAGTACGGTCGCAAGACTAAAACTCAAAGGAATTGACGGGG
GCCCGCACAAGCGGCGGAGCATGTGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGGTTT
GACATGCACAGGACGTACCTAGAGATAGGTATTCCCTTGTGGCCTGTGTGCAGGTGGTGCATGG
CTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTA
TGTTGCCAGCGGGTAATGCCGGGGACTCGTAGGAGACTGCCGGGGTCAACTCGGAGGAAGGTGG
GGATGACGTCAAGTCATCATGCCCCTTATGTCCAGGGCTTCACACATGCTACAATGGCCAGTAC
AGAGGGCTGCGAAGCCGTAAGGTGGAGCGAATCCCTTAAAGCTGGTCTCAGTTCGGATTGGGGT
CTGCAACTCGACCCCATGAAGTCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAAT
ACGTTCCCGGGCCTTGTACACACCGCCCGTCACGTCATGAAAGTCGGTAACACCCGAAGCCAGT
GGCCTAACCTTTTGGAGGGAGCTGTCGAAGGTGGGATCGGCGATTGGGACGAAGTCGTAACAAG
GTAGCCGTA
http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi
16S rDNA-base :
rapidly growing mycobacteria example.
Identification of Bacteria by DNA Sequencing:
The choice of the molecular target
• Universal target
• Genus / Species-specific target
• Genotype-specific target
Atypical phenotype
- fastidious bacterium,
- intracellular bacterium,
- biochemically poorly reactive,
- discrepant antibiotic susceptibility pattern,
- 16S DNA sequence BLAST analysis
- almost entire16S DNA gene
-less than 0.5 % ambiguous positions
98.7 %
98.7 %
Complete phenotypic and
molecular characterisation
- rpob gene sequencing
- 23S rDNA sequencing
- ITS
- groEL gene sequencing
- recA gene sequencing
-gyrB gene sequencing
[Adapted from Drancourt M, Raoult D. J Clin Microbiol. 2005; 43: 4311-5]
BACTERIAL GENOME ARCHITECTURE
chromosome
plasmid
[Audic S. et al. PLoS Genet. 2007;3:e138]
[Lescot M et al. PLoS Genet. 2008;4:e1000185]
BACTERIAL GENOME PLASTICITY:
LATERAL GENE TRANSFER
GENOME
[Raoult D. et al. Genome Res. 2003; 13:1800-9]
16S rDNA - based universal detection / identification
of bacteria : a powerfull tool in medical microbiology
16S rDNA : first-line target for universal
identification of Bacteria
Universal, present in all bacteria:
Universal primers :
fD1
357f
536f
1
357r
536r
800f
800r
1050f
1050r
~ 1,500 bp long : easily sequenced
1500
rP2
16S rDNA / DNA-DNA Hybridization plot
indicates a robust 98.7 % 16S rDNA sequence
similarity value for identification
[Stackebrandt E, Ebers J. Microbiol. Today 2006; 33:152]
16S rDNA Databases
GenBank
Poorly-controled
Free
General Microbiology
(http://www.ncbi.nlm.nih.gov/Genbank)
EzTaxon
RDP
Controled
Free
General Mycrobiology
[Chun J. et al. IJSEM 2007; 57:2259]
(http://www.eztaxon.org)
8,369 entries
Controled
Free
General Microbiology
rdp.cme.msu.edu
MicroSeq®
RIDOM
16S rDNA
Controlled
Controlled
Commercial
Medical Microbiology
(Applied Biosystems)
Environnemental Microbiology
Commercial
Medical Microbiology
(Ridom GmbH)
rdna.ridom.de
[Harmsen D. et al. Nucleic Acids Res. 2002; 30 = 416
16S rDNA – based achievement:
description of new bacterial species
(IJSEM 2007-2008 / Total = 35)
16S rDNA – based description of new medical species :
sources of isolates (Total = 35)
16S rDNA – based description of
new species of medical interest (Total = 35)
16S rDNA – based:quering the microbial
nature of microscopic particles
Lack of detection of the universal 16S rDNA agrees
with the non-microbial nature of microscopic
particles
• « Nanobacterium spp. »
[Raoult D. et al. PloS Pathog. 2008; 4:e41]
• Mimivirus
[Raoult D. et al. Science 2004; 306:1344-50]
Limits of 16S rDNA - based
universal identification of bacteria
• 16S rDNA sequence non-discriminant
- Lack of specific variability
- Multicopy
- Lateral gene transfer
• Laboratory contamination
• Databases
16S rDNA limitations
Sequence heterogeneity between the two copies
of the 16S rDNA sequence
M. mucogenicum
ATCC 49649
(Adékambi and Drancourt, 2004)
Erroneous identification may due to 16S rDNA point
mutations associated with aminoside resistance
M. abscessus and M. chelonae
(Prammananan et al. 1998)
16S rDNA limitations: mixed culture
16S rDNA limitations
16S rDNA underestimates diversity of RGM
M. senegalense = M. houstonense
16S rDNA resolving power= insufficient to
garantee correct delineation
« It is doubtful whether phylogenetic relationships should be
based solely on the 16S rRNA in cases where sequence
identies are > 99% » (Drancourt et al. 2000)
M. wolinskyi belong to M. smegmatis group
Why to develop alternative molecular
tools for universal identification ?
• Secondary target to confirm 16S rDNA based
detection / identification
• Refining molecular identification in case of 16S
rDNA ambiguity
• By-passing 16S rDNA amplicon contamination
Alternative universal molecular tools for
bacteria detection / identification
Available sequences
. rpoB
1,700
. 23S rDNA (b-subnit RNA polymerase)
5,400
. 16S-23S rDNA internal transcribed
sequence (ITS)
5,400
. groEL (heat-stock protein)
2,700
. gyrB (b-subunit of DNA gyrase)
3,700
. recA (homologous recombination gene)
3,790
. gltA (citrate-synthase gene)
835
rpoB: an alternative tool for the
detection / identification of bacteria
• It is an O.R.F.
• Encodes the b-subunit
of RNA polymerase
• 4,000 – 4,500 bp
• Universal
• Unique copy except for
Nocardia farcinica
(Ishikawa J.et al.
Proc Natl Acad Sci USA,
2004;101 : 14925-30)
rpoB-based RGM phylogeny [Iyer et al., 2005]
rpoB sequence-based taxonomy of RGM
Genes
Bootstrap > 80 %
p values
with rpoB
Combined 5 genes
82.4%
1
rpoB
recA
82.4%
64.7%
1
0.4
16S rRNA
56.3%
0.1
hsp65
47.1%
0.03
sodA
41.2%
0.01
RGM taxonomy overview : rpoB-based
Genome Diagnostic and Epidemiology
Florence Fenollar and Didier Raoult. APMIS. 2004;112:785-807.
Fournier PE et al. Lancet Infect Dis. 2007;7:711-23.
ESTIMATION OF GENOMIC G + C CONTENT
•
•
The G + C content is a global estimator of genome.
Genomic G + C content can be estimated by rpoB gene G + C content: GCg =
1.2065 x GCr – 11.495 [Fournier PE. et al. Int J Syst Evol Microbiol. 2006; 56: 10259]
WHOLE GENOME COMPARISONS
IN BACTERIA
• Experimental : DNA : DNA Hybridization > 70 %
• Bio-informatics: A verage Nucleotide (identify > 95 %) [Goris J et al. DNADNA hybridization values and their relationship to whole-genome sequence
similarities. Int J Syst Evol Microbiol. 2007;57:81-91 ; Konstantinidis KT et al.
Toward a more robust assessment of intraspecies diversity, using fewer
genetic markers. Appl Environ Microbiol. 2006;72:7286-93].
• Bio-informatics: rpoB gene sequence similarity > 97.7 % [Adékambi T et al.
Complete rpoB gene sequencing as a suitable supplement to DNA-DNA
hybridization for bacterial species and genus delineation. Int J Syst Evol
Microbiol. 2008;58:1807-14].
ANI : the Mycobacterium tuberculosis complex
M.
tuberculosis
CDC1551
M.
tuberculosis
H37Rv
M.
bovis
AF212
7
M.
bovis
BCG
M.
aviu
m
M. avium
subsp.
paratubercu
losis
M. tuberculosis
CDC1551
-
99.87
99.75
99.73
79.49
79.56
M. tuberculosis
H37Rv
99.89
-
99.76
99.71
79.40
M. bovis AF2122/97
99.83
99.80
-
99.90
M. bovis BCG
1173P2
99.83
99.80
99.93
M. avium hominsuiss
78.72
78.71
M. avium subsp.
paratuberculosis
79.13
M. ulcerans
M.
marinum
M.
abscessus
M.
smegmati
s
M.
leprae
79.12
79.15
71.39
74.17
76.14
79.56
79.08
79.06
71.34
74.08
76.17
79.54
79.60
79.04
79.10
71.35
74.13
76.17
-
79.53
79.61
79.07
79.12
71.35
74.11
76.17
78.81
78.82
-
97.90
78.23
78.28
72.16
75.22
75.35
79.14
79.21
79.22
98.71
-
78.42
78.40
72.21
75.40
75.38
79.05
79.05
79.04
79.05
78.44
78.86
-
98.90
71.49
74.09
74.95
M. marinum
78.50
78.52
78.42
78.47
78.39
78.44
98.06
-
71.44
73.72
74.49
M. abscessus
71.19
71.20
71.19
71.20
72.10
72.11
71.11
71.30
-
72.09
69.79
M. smegmatis
73.95
73.96
73.97
73.97
75.47
75.52
73.72
73.68
72.04
-
71.45
M. leprae
78.87
78.88
78.89
78.89
78.08
78.08
77.13
77.18
70.66
72.84
-
genome \ reference
M.
ulcerans
GENOME-BASED MULTI-LOCUS
SEQUENCE-TYPING (MLST)
Intégration dans le Point of Care (P.O.C.)
GENOME-BASED MULTISPACER
SEQUENCE TYPING (M.S.T)
• We developed MST first for paleomicrobiological
investigations of ancient Yersinia pestis (plague)
[Drancourt M. et al. Emerg Infect Dis. 2007;13:332-3].
• MST relies on genome alignements
MST PRINCIPLE
Spacer B
Spacer A
Orf 2
Orf 1
Seq 1
Seq 2
Seq 3
Orf 4
Orf 3
Seq 1
Seq 2
Spacer A
Spacer
B
Strain 1
1
1
2
1
Strain 2
2
4
3
2
Strain 3
1
1
2
1
Strain 4
3
3
1
3
Strain
Spacer C
Spacer C MST type
Seq 1
Seq 2
Seq 3
Seq 1 differs from Seq 2 by:
- SNP
- VNTR
- Deletion / Insertion
These 3 events have the
same genetic weight
1- CHOICE OF SPACER
M. tuberculosis H37Rv
Gene A
M. tuberculosis CDC1551
Gene A
Gene B
Gene C
Gene B
Gene C
D
Gene E
D
Gene E
FF
F
Spacer sequences extraction
perl script software
Comparison of homologous spacer
- Sequence length of ≤ 500-bp
Difseq software
- Sequence similarity of 70-99%
- Spacer exhibiting >– 3 genotypes
Spacer selection/
criteria
8 most variable spacers:
- 4 VNTR loci described (ETR)
- 4 Newly evaluated spacers
M.S.T. APPLICATIONS: identification of
Mycobacterium tuberculosis complex species
* Selection of spacers
- MST1
- MT2221
- MST2
- MST3
- ETR-B
- ETR-C
- ETR-D
- Mtub21
- 6 Single nucleotide polymorphisms
All MTC species at once
- Variable number tandem repeat (1-7)
- 2 Deletions
[Djelouadji Z. et al. PLoS One. 2008;3:e2433]
Euro-american lineage
West-African 1 lineage
Indo-Oceanic lineage
East-African Indian lineage
West-African 2 lineage
Asian lineage
Beijing family
[Djelouadji Z. et al. PLoS Negl Trop dis. 2008;2:e253]
APPLICATIONS OF M.S.T
Micro-organism
Application
Reference
M. avium complex
M. Tuberculosis complex
I, G.
I, G.
Cayrou C. et al. unpublished
Djelouadji Z. et al. PLoS ONE 2008
Djelouadji Z. et al. PLoS Negl Trop Dis 2008
T. whipplei
R. prowazekii
R. sibirica
C. burnetii
B. henselae
B. quintana
G
G
G
G.
G
G
Li W. et al. Microbiology 2008
Zhu Y. et al. J Clin Microbiol. 2005
Zhang L. et al. J Clin Microbiol. 2006
Glazunova O. et al. Emerg Infect Dis. 2005
Li W. et al. J Clin Microbiol. 2006
Foucault C. et al. J Clin Microbiol. 2004
Wooley MW. et al. J Clin Microbiol. 2007
Y. pestis
G
Drancourt M. et al. Emerg Infect. Dis. 2004
Identification based on specific genes /targets
Bacterial species
gene target
Neisseria meningitidis
Streptococcus pneumoniae
Escherichia coli
Listeria monocytogenes
Mycoplasma pneumoniae
Bordetella pertussis
Chlamydia (C. pneumoniae, C. psittaci, C. trachomatis)
Staphylococcus aureus
Borrelia burgdorferi
Streptococcus agalactiae (groupe B)
ctrA - crgA
PlyN - LytA
rpoB
hlyQ
P1
IS481
Omp2
nucA
FliD
Sip
GENOME-BASED, RANDOM
ACCESS IDENTIFICATION
Prospective: place of sequence-based
identification in near future
Specimen
Isolate
Culture
High-throughput
phenotypic identification
Sequence-based
identification
Prospective: DNA Sequence – based
Identification in the near Future
• Universal gene (16S rDNA, rpoB) sequence for the
description of new species
• Specific gene sequence to resolve unique
situations : oral Streptococcus spp.
• Decreasing role for routine identification of
isolates :
- Mass-spectrometry profiling [Sauer S. et al. PLoS ONE 2008;3:e2843]
- High throughput biomechemical profiling
TARGETING GENOME REPEATS:
TO INCREASE TEST SENSITIVITY
Resolution of Genome-based sequencing methods
for bacteria detection, identification, genotyping
Multispacer sequence typing MST
Surface protein gene sequencing
Multilocus sequence typing MLST
Random access
sequence
Species-specific
gene
Repeats
species
Sub-species
isolates