Ancestral Sequence Reconstruction: methods and applications

Ancestral Sequence Reconstruction:
methods and applications
Julien Dutheil1
<[email protected]>
1 BiRC
– Bioinformatics Research Center,
University of Århus
http://birc.au.dk/~ jdutheil/Teaching/
February 2008
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
1 / 18
Introduction
... ancestral sequences?
Putative bio-sequence (DNA, RNA, codon, protein) of an extinct
organism.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
2 / 18
Introduction
... ancestral sequences?
Putative bio-sequence (DNA, RNA, codon, protein) of an extinct
organism.
... reconstruction?
Despite a few particular cases (ancient DNA), ancestral sequences
can’t be observed and have to be inferred from their contemporary
homologues.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
2 / 18
Introduction
... ancestral sequences?
Putative bio-sequence (DNA, RNA, codon, protein) of an extinct
organism.
... reconstruction?
Despite a few particular cases (ancient DNA), ancestral sequences
can’t be observed and have to be inferred from their contemporary
homologues.
... so what?
Methodology close to the study of the fossil record: leads to an image
of the past in order to better understand what’s going on in present
time.
In practice, fossil DNA is rare, particularly when we study very ancient
times.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
2 / 18
Outline of the lecture
1
Reconstructing ancestral sequences
2
Application: gene resurrection
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
3 / 18
Reconstructing ancestral sequences
Using Maximum Parsimony
Ancestral states are computed together
with the score: Walter Fitch’s
algorithm [Fitch, 1971]
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
4 / 18
Reconstructing ancestral sequences
Using Maximum Parsimony
Ancestral states are computed together
with the score: Walter Fitch’s
algorithm [Fitch, 1971]
The ancestral set Sx at node x is
defined as
Sx,1 ∩ Sx,2 ifSx,1 ∩ Sx,2 =
6 ∅
Sx =
Sx,1 ∪ Sx,2 ifSx,1 ∩ Sx,2 = ∅
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
4 / 18
Reconstructing ancestral sequences
Using Maximum Parsimony
Ancestral states are computed together
with the score: Walter Fitch’s
algorithm [Fitch, 1971]
The ancestral set Sx at node x is
defined as
Sx,1 ∩ Sx,2 ifSx,1 ∩ Sx,2 =
6 ∅
Sx =
Sx,1 ∪ Sx,2 ifSx,1 ∩ Sx,2 = ∅
The parsimony score is obtained by
counting the number of unions.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
4 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Several equally parsimonious
scenarios are found in many
cases. Swofford and Maddison
[1987] introduced the
ACCTRAN and DELTRAN
methods to chose between these
scenarios
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Getting ancestral states
To reconstruct the scenario of
ancestral states, we need a
second pass on the tree (prefix)
Several equally parsimonious
scenarios are found in many
cases. Swofford and Maddison
[1987] introduced the
ACCTRAN and DELTRAN
methods to chose between these
scenarios
But do not account for branch
lengths, all substitutions are
considered equal...
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
5 / 18
Reconstructing ancestral sequences
Models, parameters, random variables and likelihood
Parameters: tree (topology + branch lengths), substitution matrix
(Q),. . .
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
6 / 18
Reconstructing ancestral sequences
Models, parameters, random variables and likelihood
Parameters: tree (topology + branch lengths), substitution matrix
(Q),. . .
Random variable: ancestral states, evolutionary rate distribution,. . .
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
6 / 18
Reconstructing ancestral sequences
Models, parameters, random variables and likelihood
Parameters: tree (topology + branch lengths), substitution matrix
(Q),. . .
Random variable: ancestral states, evolutionary rate distribution,. . .
Felsenstein’s recursion:
Tn1 , xn1
t
n1
Li (Tn , xn ) =

1 if Tn is a leaf with state xn at site i ,




0 if Tn is a leaf with state 6= xn at site

! i,



X

pxn ,xn1 (tn1 ) × Li (Tn1 , xn1 )

xn1

!


X



pxn ,xn2 (tn2 ) × Li (Tn2 , xn2 )

 ×
Tn , xn
tn2
Tn2 , xn2
otherwise.
xn2
n1 and n2 are the son nodes of node n.
All pxn ,xn1 and pxn ,xn2 are given by the matrix e Q×t .
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
6 / 18
Reconstructing ancestral sequences
Estimation and reconstruction
Parameter estimation: maximum likelihood
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
7 / 18
Reconstructing ancestral sequences
Estimation and reconstruction
Parameter estimation: maximum likelihood
Reconstruction of hidden random variables: Bayesian approach
Pr(X |D, Θ) = Pr(D, X |Θ)/ Pr(D|Θ)
[X =variable, D=data, Θ=parameters]
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
7 / 18
Reconstructing ancestral sequences
Estimation and reconstruction
Parameter estimation: maximum likelihood
Reconstruction of hidden random variables: Bayesian approach
Pr(X |D, Θ) = Pr(D, X |Θ)/ Pr(D|Θ)
[X =variable, D=data, Θ=parameters]
In this equation Θ is supposed to be known, which is actually not the
case. Two approaches are used:
◮
◮
Use a a priori distribution for Θ (full Bayesian = hierarchical Bayesian)
Use a “degenerate” distribution, as Θ = Θ̂, the ML estimate for Θ
(empirical Bayesian)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
7 / 18
Reconstructing ancestral sequences
Estimation and reconstruction
Parameter estimation: maximum likelihood
Reconstruction of hidden random variables: Bayesian approach
Pr(X |D, Θ) = Pr(D, X |Θ)/ Pr(D|Θ)
[X =variable, D=data, Θ=parameters]
In this equation Θ is supposed to be known, which is actually not the
case. Two approaches are used:
◮
◮
Use a a priori distribution for Θ (full Bayesian = hierarchical Bayesian)
Use a “degenerate” distribution, as Θ = Θ̂, the ML estimate for Θ
(empirical Bayesian)
Empirical Bayesian approaches were first used by Ziheng Yang for
the ancestral sequence reconstruction case [Yang et al., 1995].
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
7 / 18
Reconstructing ancestral sequences
Marginal reconstruction (Yang)
V
x?
A
We are interested in the state at a
particular node (x)
V
D
D
E
V
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
8 / 18
Reconstructing ancestral sequences
Marginal reconstruction (Yang)
V
x?
A
V
D
We are interested in the state at a
particular node (x)
The probability of each state at site i is
given by:
Pr(Xi = x|D, Θ) =
D
E
Pr(x, D|Θ)
Pr(D|Θ)
=
Li ,x
Li
V
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
8 / 18
Reconstructing ancestral sequences
Marginal reconstruction (Yang)
V
x?
A
V
D
We are interested in the state at a
particular node (x)
The probability of each state at site i is
given by:
Pr(Xi = x|D, Θ) =
D
E
V
Julien Dutheil (BiRC – University of Århus)
Pr(x, D|Θ)
Pr(D|Θ)
=
Li ,x
Li
We keep the state with the maximum
probability.
Ancestral Sequence Reconstruction
February 2008
8 / 18
Reconstructing ancestral sequences
Joint reconstruction (Yang)
x3 ?
x2 ?
V
We are interested in the states at all
nodes ({x1 . . . xn })
A
x1 ? x6 ?
V
D
D
E
x4 ?
x5 ?
V
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
9 / 18
Reconstructing ancestral sequences
Joint reconstruction (Yang)
x3 ?
x2 ?
x1 ? x6 ?
V
We are interested in the states at all
nodes ({x1 . . . xn })
A
The probability of a given set of states i
is given by:
V
D
D
E
x4 ?
x5 ?
Pr(Xi1 = x1 , . . . , Xin = xn |D, Θ)
Pr(x1 , . . . , xn , D|Θ)
=
Pr(D|Θ)
Li ,x1,...,xn
=
Li
V
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
9 / 18
Reconstructing ancestral sequences
Joint reconstruction (Yang)
x3 ?
x2 ?
x1 ? x6 ?
V
We are interested in the states at all
nodes ({x1 . . . xn })
A
The probability of a given set of states i
is given by:
V
D
D
E
x4 ?
x5 ?
V
Julien Dutheil (BiRC – University of Århus)
Pr(Xi1 = x1 , . . . , Xin = xn |D, Θ)
Pr(x1 , . . . , xn , D|Θ)
=
Pr(D|Θ)
Li ,x1,...,xn
=
Li
We keep the set of states with the
maximum probability
Ancestral Sequence Reconstruction
February 2008
9 / 18
Reconstructing ancestral sequences
Joint reconstruction (Yang)
x3 ?
x2 ?
x1 ? x6 ?
V
We are interested in the states at all
nodes ({x1 . . . xn })
A
The probability of a given set of states i
is given by:
V
D
D
E
x4 ?
x5 ?
V
Pr(Xi1 = x1 , . . . , Xin = xn |D, Θ)
Pr(x1 , . . . , xn , D|Θ)
=
Pr(D|Θ)
Li ,x1,...,xn
=
Li
We keep the set of states with the
maximum probability
Pb: there are many possible sets of
states!
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
9 / 18
Reconstructing ancestral sequences
In practice. . .
The PAML software (Phylogenetic Analysis using Maximum
Likelihood) of Ziheng Yang can reconstruct ancestral sequences
according to the marginal and joint methods
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
10 / 18
Reconstructing ancestral sequences
In practice. . .
The PAML software (Phylogenetic Analysis using Maximum
Likelihood) of Ziheng Yang can reconstruct ancestral sequences
according to the marginal and joint methods
Tal Pupko proposed a fast algorithm to reconstruct ancestral
sequences according to the joint method, available in the FastML
software [Pupko et al., 2000]
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
10 / 18
Reconstructing ancestral sequences
In practice. . .
The PAML software (Phylogenetic Analysis using Maximum
Likelihood) of Ziheng Yang can reconstruct ancestral sequences
according to the marginal and joint methods
Tal Pupko proposed a fast algorithm to reconstruct ancestral
sequences according to the joint method, available in the FastML
software [Pupko et al., 2000]
FastML also implements a heuristic algorithm for the joint method
with a non-uniform distribution of substitution rates (Γ law)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
10 / 18
Reconstructing ancestral sequences
Additional remarks
The union of marginal reconstruction may not be equal to the joint
reconstruction. The chances that the two reconstruction differ
increase with the presence of long branches in the tree.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
11 / 18
Reconstructing ancestral sequences
Additional remarks
The union of marginal reconstruction may not be equal to the joint
reconstruction. The chances that the two reconstruction differ
increase with the presence of long branches in the tree.
The reconstructed sequences depend on the model and parameters,
including of course the phylogeny. In most cases, it i very useful to
compare different results obtained with different models.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
11 / 18
Reconstructing ancestral sequences
Additional remarks
The union of marginal reconstruction may not be equal to the joint
reconstruction. The chances that the two reconstruction differ
increase with the presence of long branches in the tree.
The reconstructed sequences depend on the model and parameters,
including of course the phylogeny. In most cases, it i very useful to
compare different results obtained with different models.
There is an uncertainty in the reconstruction process! To assess this
uncertainty, one can:
◮
◮
check the second highest probability and compare it with the highest
one,
sample several sequences from the posterior distribution instead of
getting the one with the highest probability.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
11 / 18
Application: gene resurrection
1
Reconstructing ancestral sequences
2
Application: gene resurrection
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
12 / 18
Application: gene resurrection
Principle
Reconstruct one or
several ancestral
sequences in silico
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
13 / 18
Application: gene resurrection
Principle
ATTAGCATCGATACTGCGTTGCGTGCCAAC
Synthesis
Reconstruct one or
several ancestral
sequences in silico
Synthesize the
corresponding protein
Amplification (PCR)
Cloning into vector
Expression in a cell
Protein purification
Analysis
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
13 / 18
Application: gene resurrection
Principle
ATTAGCATCGATACTGCGTTGCGTGCCAAC
Synthesis
Reconstruct one or
several ancestral
sequences in silico
Synthesize the
corresponding protein
Study the biological
characteristics of the
ancestral protein
Amplification (PCR)
Cloning into vector
Expression in a cell
Protein purification
Analysis
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
13 / 18
Application: gene resurrection
Principle
Reconstruct one or
several ancestral
sequences in silico
Synthesize the
corresponding protein
Study the biological
characteristics of the
ancestral protein
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
13 / 18
Application: gene resurrection
The sight of dinosaurs
[Chang et al., 2002]
Visual pigment: rhodopsin, involved in sight with sparse light
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
14 / 18
Application: gene resurrection
The sight of dinosaurs
[Chang et al., 2002]
Visual pigment: rhodopsin, involved in sight with sparse light
The archosaurs include
the extinct dinosaurs and
living birds and
crocodiles. Their actual
physiology is to a large
extent unknown.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
14 / 18
Application: gene resurrection
The sight of dinosaurs
[Chang et al., 2002]
Visual pigment: rhodopsin, involved in sight with sparse light
The archosaurs include
the extinct dinosaurs and
living birds and
crocodiles. Their actual
physiology is to a large
extent unknown.
The functional ancestral
rhodopsin absorbs at
508nm, which is ”reder”
that all known living
vertebrates
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
14 / 18
Application: gene resurrection
The sight of dinosaurs
[Chang et al., 2002]
Visual pigment: rhodopsin, involved in sight with sparse light
The archosaurs include
the extinct dinosaurs and
living birds and
crocodiles. Their actual
physiology is to a large
extent unknown.
The functional ancestral
rhodopsin absorbs at
508nm, which is ”reder”
that all known living
vertebrates
This result is consistent with the hypothesis of a nocturnal ancestor
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
14 / 18
Application: gene resurrection
The color of corals [Ugalde et al., 2004]
Three colors: blue, green and red
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
15 / 18
Application: gene resurrection
The color of corals [Ugalde et al., 2004]
Three colors: blue, green and red
Convergences, which one was the ancestral
state?
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
15 / 18
Application: gene resurrection
The color of corals [Ugalde et al., 2004]
Three colors: blue, green and red
Convergences, which one was the ancestral
state?
in silico reconstruction, with nucleotides,
codons and amino-acids models, resulting in
a good consensus
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
15 / 18
Application: gene resurrection
The color of corals [Ugalde et al., 2004]
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
15 / 18
Application: gene resurrection
Galliforms’ lysozyme [Malcolm et al., 1990]
The ancestor of ancestral
reconstruction :)
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
16 / 18
Application: gene resurrection
Galliforms’ lysozyme [Malcolm et al., 1990]
The ancestor of ancestral
reconstruction :)
Three sites of interest
Julien Dutheil (BiRC – University of Århus)
Pheasant
Green pheasant
Quail
California quail
Guinea fowl
Chicken
Japanese quail
Turkey
Satyr tragopan
Temminck’s tragopan
Himalayan pheasant
Blue peafowl
Reeves’s pheasant
Lady Amherst’s pheasant
Copper pheasant
Bobwhite quail
Bare-faced curassow
Plain chachalaca
Ancestral Sequence Reconstruction
THR
THR
SER
SER
SER
THR
THR
THR
THR
THR
THR
THR
THR
THR
THR
THR
THR
THR
ILE
ILE
VAL
VAL
VAL
ILE
ILE
ILE
ILE
ILE
ILE
ILE
ILE
ILE
ILE
ILE
ILE
ILE
February 2008
SER
SER
THR
THR
THR
SER
SER
SER
SER
SER
SER
SER
SER
SER
SER
SER
SER
SER
16 / 18
Application: gene resurrection
Galliforms’ lysozyme [Malcolm et al., 1990]
The ancestor of ancestral
reconstruction :)
Three sites of interest
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
16 / 18
Application: gene resurrection
Galliforms’ lysozyme [Malcolm et al., 1990]
The ancestor of ancestral
reconstruction :)
Three sites of interest
The authors used directed
mutagenesis to reconstruct all
possibles ancestral sequences
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
16 / 18
Application: gene resurrection
Galliforms’ lysozyme [Malcolm et al., 1990]
The ancestor of ancestral
reconstruction :)
Three sites of interest
The authors used directed
mutagenesis to reconstruct all
possibles ancestral sequences
Synthesis , biochemical study
and crystallization of all
resulting proteins
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
16 / 18
Application: gene resurrection
Results
All putative ancestral proteins
are stable and active
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
17 / 18
Application: gene resurrection
78
80
Results
76
SIT
TVT
74
TIS
SVT
SIS
72
There is a relation between the
total volume and stability
Thermostabilité
All putative ancestral proteins
are stable and active
TIT
TVS
68
70
SVS
140
160
180
200
Volume des chaînes latérales
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
17 / 18
Application: gene resurrection
80
78
tm
76
80
TVT
74
74
SVT
TIS
SVT
72
72
80 70
76
78
80 70
78
Ancestral Sequence Reconstruction
74
TIS
SVT
72
SIS
70
72
SVT
TVS
SIT
tm
76
tm
TVT
TIS
70
Julien Dutheil (BiRC – University of Århus)
TIT
SVS
There is a relation between the
total volume and stability
Some ancestral configurations
however are more stable than
the observed ones!
TIS
SIS
74
All putative ancestral proteins
are stable and active
tm
76
78
Results
February 2008
17 / 18
Application: gene resurrection
References
B. S. W. Chang, K. Jönsson, M. A. Kazmi, M. J. Donoghue, and T. P. Sakmar. Recreating a
functional ancestral archosaur visual pigment. Molecular Biology and Evolution, 19(9):
1483–1489, 2002.
W. M. Fitch. Toward Defining the Course of Evolution: Minimum Change for a Specific Tree
Topology. Systematic Zoology, 20(4):406–416, 1971.
B. A. Malcolm, K. P. Wilson, B. W. Matthews, J. F. Kirsch, and A. C. Wilson. Ancestral
lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon
packing. Nature, 345(6270):86–89, 1990.
T. Pupko, I. Pe’er, R. Shamir, and D. Graur. A fast algorithm for joint reconstruction of
ancestral amino acid sequences. Molecular Biology and Evolution, 17(6):890–896, 2000.
D. L. Swofford and W. P. Maddison. Reconstructing Acestral Character States Under Wagner
Parsimony. Mathematical Biosciences, 87:199–229, 1987.
J. A. Ugalde, B. S. W. Chang, and M. V. Matz. Evolution of coral pigments recreated. Science,
305(5689):1433–1433, 2004.
Z. Yang, S. Kumar, and M. Nei. A new method of inference of ancestral nucleotide and amino
acid sequences. Genetics, 141(4):1641–1650, 1995.
Julien Dutheil (BiRC – University of Århus)
Ancestral Sequence Reconstruction
February 2008
18 / 18