Analyse de séquences et phylogénie moléculaire

Analyse de séquences et phylogénie
moléculaire From phylogenetics to phylogenomics:
Lessons from the Eucarya
École doctorale E2M2 – 2015-2016
(http://www.frangun.org)
Céline Brochier ([email protected])
Guy Perrière ([email protected])
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Phylogeny of Eucarya (eighties)
Crown
Based on SSU rRNA
A well resolved asymmetric
base
An unresolved crown
Late emergence of multicellular
eukaryotes
Early emergence of
amitochondriate lineages
(Archezoa)
⇒
Asymmetric
base
Gradual complexity of the
eukaryotic cell
(Adapted from Sogin Early evolution and the origin of eukaryotes - Curr. Opin. Genet. Dev. – 1991)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Origin of the mitochondria
(From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Archezoa
Cavalier-Smith Eukaryotes with no mitochondria – Nature - 1987
Cavalier-Smith Archaebacteria and Archezoa - Nature - 1989
Metamonad
Retortamonas
Archamoeba
Pelomyxa
Parabasalia
Trichomonas
Microsporidia
Nosema
(From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The archezoa hypothesis
“Archezoa are eukaryotes which primitively lack mitochondria”
The nucleus originated before the mitochondrial endosymbiosis
The first eukaryotes were anaerobes
Archezoans might provide insights into the nature of ancestral eukaryotic
genomes and biology
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The archezoa hypothesis
“Archezoa are eukaryotes which primitively lack mitochondria”
The nucleus originated before the mitochondrial endosymbiosis
The first eukaryotes were anaerobes
Archezoans might provide insights into the nature of ancestral eukaryotic
genomes and biology
BUT… the hypothesis would fall if
We find mitochondrial genes in archezoan genomes
We find that archezoans branch among aerobic species with mitochondria
Mitochondrion-derived organelles are found in archezoans
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Mol. Biol. Evol. 2001
Mitochondrial genes
Parabasalia
Trichomonas vaginalis 1
(Hydrogenosomes)
Trichomonas vaginalis 2
Metamonads (Mitosomes)
Giardia intestinalis
Mitochondrial genes
Proteobacteria
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The archezoa hypothesis
“Archezoa are eukaryotes which primitively lack mitochondria”
The nucleus originated before the mitochondrial endosymbiosis
The first eukaryotes were anaerobes
Archezoans might provide insights into the nature of ancestral eukaryotic
genomes and biology
BUT… the hypothesis would fall if
We find mitochondrial genes in archezoan genomes
We find that archezoans branch among aerobic species with mitochondria
Mitochondrion-derived organelles are found in archezoans
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Conflicting phylogenetic signals
SSU rRNA
Actin
β-Tubulin
(Philippe H. et al. Early-branching or fast-evolving eukaryotes? - Proc. Biol. Sci. - 2000)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The phylogeny of Eucarya is severely affected by
tree reconstruction artefacts
B
B
C
q
q
C
p<q2
p
A
D
A
D
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Microsporidia are related to Fungi
The use of accurate evolutionary models (S/F, removal of invariant sites, Gamma corrections, etc.) and
methods (ML, Bayesian) disentangles the phylogenetic position of Microsporidia
ML – V-ATPase
(Vivares et al. Curr Opin Microbiol. 2002)
ML – RPB1
(Hirt et al. PNAS 2001)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The archezoa hypothesis
“Archezoa are eukaryotes which primitively lack mitochondria”
The nucleus originated before the mitochondrial endosymbiosis
The first eukaryotes were anaerobes
Archezoans might provide insights into the nature of ancestral eukaryotic
genomes and biology
BUT… the hypothesis would fall if
We find mitochondrial genes in archezoan genomes
We find that archezoans branch among aerobic species with mitochondria
Mitochondrion-derived organelles are found in archezoans
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Hydrogenosomes and Mitosomes are
mitochondria remnants
50 nm
50 nm
(2002)
(2003)
50 nm
1 µm
(Nyctotherus, 2005)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The archezoa hypothesis
“Archezoa are eukaryotes which primitively lack mitochondria”
The nucleus originated before the mitochondrial endosymbiosis
The first eukaryotes were anaerobes
Archezoans might provide insights into the nature of ancestral eukaryotic
genomes and biology
BUT… the hypothesis would fall if
We find mitochondrial genes in archezoan genomes
We find that archezoans branch among aerobic species with mitochondria
Mitochondrion-derived organelles are found in archezoans
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Origin of the mitochondria
(From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Phylogeny of Eucarya (nineties)
Microsporidia
Crown
?
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Why such a lack of resolution?
Radiation
Too few phylogenetic signal has been recorded in sequences
∆t short
(Gribaldo and Brochier - Phylogeny of prokaryotes: does it exist and why should we care? - Res Micro - 2009)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Why such a lack of resolution?
Radiation
Too few phylogenetic signal has been recorded in sequences
∆t short
Substitutional saturation
The ancient phylogenetic signal is progressively erased by multiple
substitutions
Observed substitutions
Real number of
substitutions
(Gribaldo and Brochier - Phylogeny of prokaryotes: does it exist and why should we care? - Res Micro - 2009)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
How can we overcome the lack of signal?
Improve methods of reconstruction and evolutionary models
(Van de Peer, Ben Ali, Meyer – Gene - 2000)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
How can we overcome the lack of signal?
Improve the taxonomic sampling to avoid tree reconstruction artefact &
reduce saturation
28S rRNA of 31 Gnathostoma
(parsimony – 529 positions)
(Lecointre et al. – Mol. Phyl. Evol. - 1993)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
How can we overcome the lack of signal?
Increase the amount of data
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Improving the amount of data is not sufficient
The use of inaccurate evolutionary models can deadly bias
phylogenetic inferences
(~70 nuclear genes 17807 amino acid positions)
Fungi
Fungi
Deuterostoma
Deuterostoma
Arthropoda
Arthropoda
Nematoda
WAG+F+Γ
Nematoda
CAT+F+Γ
(Lartillot, Brinkmann and Philippe – BMC Evol. Biol. - 2007)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Improving the amount of data is not sufficient
The use of inaccurate taxonomic sampling can deadly bias
phylogenetic inferences
(~70 nuclear proteins,17807 amino acid positions)
Fungi
Fungi
Cnidaria+Choano
Deuterostoma
Deuterostoma
Arthropoda
Arthropoda
Nematoda
Nematoda
WAG+F+Γ
(Lartillot, Brinkmann and Philippe – BMC Evol. Biol. - 2007)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Improving the amount of data is not sufficient
Break the long branches to avoid Long Branch Attraction artefacts
146 nuclear proteins (35,346 amino-acid positions), ML (JTT+Γ)
(Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Improving the amount of data is not sufficient
Compositional biases and multiple substitutions can deadly affect
tree inference…
46 r-prot, 137 proteobacterial species, 15372 nucleic acid positions
(Ramulu et al. – Mol. Phylogent. Evol. - 2014)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Improving the amount of data is not sufficient
especially when using inaccurate evolutionary models
ε
ε
GG+Γ
δ
α
GTR+Γ
α
δ
β
β
γ
γ
46 r-prot, 137 proteobacterial species, 15372 nucleic acid positions (Ramulu et al. – Mol. Phylogenet. Evol. - 2014)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Garbage in, garbage out
The recoding of multiple alignments can help to overcome systematic
biases
4 mitochondrial genes
(3 729 nucleic acid positions)
ML (GTR+I+Γ)
⇒ Ticks emerge with A+T rich insects
Recoding RY + ML (CF+I+Γ)
⇒ Ticks emerge within other chelicerates
(Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Garbage in, garbage out
The removal of fastest evolving sites can help to overcome
systematic biases and to detect Long Branch Attraction artefacts
146 nuclear genes (35371 amino acid positions), ML (JTT + Γ)
Slow-Fast method
(Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Supermatrix approaches have improved our
knowledge of eukaryotic phylogeny
Year
2002
2004
2005
2007
2007
2008
2007
2009
2010
2010
2012
Author
Bapteste et al
Philippe et al
RodiguezEzpeleta et al
Patron et al
Burki et al
Burki et al
RodiguezEzpeleta et al
Hampl et al
Baurain et al
Burki et al.
Burki et al.
Number
Main results
of
genes
123
Monophyly of Amoebozoa
129
Monophyly of Opisthokonta
143
Monophyly des Plantae
102
123
135
143
143
108
167
258
Monophyly Haptophyta + Cryptophyta
Question the monophyly of Chromalveolata
Early emergence of Excavata within Bikonta
Monophyly of Excavata and question the monophyly of
Chromalveolata
Monophyly of Excavata
Question the monophyly of Chromalveolata
Monophyly of Rhizaria
Question the monophyly of Chromalveolata
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Phylogenomics led to the emergence of super-phyla
(Simpson and Roger- The ‘real’ kingdoms of eukaryotes – Current biology 2004)
(Baldauf et al. The Deep Roots of Eukaryotes – Science – 2003)
(Pawlowski, Jan Protist Evolution and Phylogeny. In: eLS. John Wiley & Sons, Ltd: Chichester – 2014)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The Opisthokonta
-
Typical features:
One posteriorly-inserting flagellum
Flattened mitochondrial cristae
Specific insertion in the EF1-alpha
Divided in Holozoa and Holomycetes
Metazoa
96
84
94
Ministeriids
Choanoflagellates
Ichthyosporeans
Corallochytreans
Fungi
Nucleariid
Ef-1a+HSP70
+act+tub
Homo
Saccharomyces
Glugea
Arabidopsis
Trypanosoma
Plasmodium
Giardia
Sulfolobus
E. coli
WFGGWKVTRKDGNASGTTLL
WYGGWEKETKAGVVKGKTLL
WFKGWKPVSGAGDSI-FTLE
WYKG------------PTLL
WYKG------------PILV
WYKG------------RTLI
WYEG------------PCLI
WYNG------------PTLE
WEAK-----------ILELA
(Steenkamp et al. The Protistan Origins of Animals and Fungi - MBE - 2006)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The Amoebozoa
Mainly composed of amoebae and amoeboid
flagellates
Some species have flagella and/ or
subpseudopodia
Many species have branching, irregular
mitochondrial cristae
Amitochondriate members
Artificially divided in amoebae, slime moulds and
flagellated amoebae
Pelomixa
123 genes (25,032 aa
positions) – JTT + Γ4.
97
(Bapteste et al - PNAS - 2002)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The Archaeplastida
-
Typical features:
Presence of primary plastids
Duplication of the cytosolic Fructose 1,6bisphosphate Aldolase (FBA)
Type I transcription factor pBRp
Two membrane-bound plasts
Divided in Viridiplantae, Rhodoplastida and
Glaucophyta
143 genes
(30,113 aa
positions) – ML
- BI
98
97
(Rodríguez-Ezpeleta et al.– Current Biology - 2005)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The SAR clade or Harosa
1
-
Typical features:
Presence of secondary (red and green)
photosynthetic lineages
Duplication of the GTPase Rab1
Divided in Stramenopiles, Alveolata and
Rhizaria
258 genes
(55, 881 aa pos.)
BI – CAT + Γ4
(Burki et al. The evolutionary history of haptophytes and
cryptophytes: phylogenomic evidence for separate origins – Proc Biol Sci. 2012 )
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The CCTH clade or Hacrobia
-
Typical features:
Free-living, heterotrophic, mixotrophic and
autotrophic
No synapomorphies
No parasits
Divided in Cryptomonads, Centrohelids,
Telonemids, Haptophyta, + Kathablepharids and
Picobiliphytes
127 genes (25, 235 aa pos.)
BI – CAT + Γ4, RaxML + Recod
(Burki et al. Large-Scale Phylogenomic Analyses Reveal That Two Enigmatic Protist Lineages, Telonemia
and Centroheliozoa, Are Related to Photosynthetic Chromalveolates – GEB 2009 )
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
The Excavata
-
Typical features:
Heterotrophic flagellates
Most live in oxygen-poor environments
Most harbor non-aerobic mitochondria
Contain a distinctive longitudinal feeding groove
Include secondary photosynthetic lineages
(Euglenozoa)
Divided in Metamonads (amito.) and Discoba (mito.)
Pseudo
Heterolobosea
ciliata
Jakobida
Kinetoplastida
143 genes (35,584 aa pos.) ML
– WAG + Γ4
Diplomonad
Euglenozoa
Parabasalia
(Hampl et al. Phylogenomic analyses support
the monophyly of Excavata and resolve relationships among eukaryotic ‘‘supergroups’’ – PNAS 2009)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Is the story over?
The phylogenetic position of some
lineages remains elusive
Apusozoa, Breviates, Collodictyonids,
Colponema, Hemimastigophora,
Palpitomonas, etc.
Uncultured protists (picoeukaryotes, etc.)
What are the relationships among the
super-phyla?
Where is the root of Eucarya?
In-between Unikonta and Bikonta?
(DHFR/TS fusion)
In-between Plantae and other phyla?
(distribution pattern of rare aa)
In-between Euglenozoa and other
phyla? (cytochrome synthesis pathway)