Ancestral Sequence Reconstruction: methods and applications Julien Dutheil1 <[email protected]> 1 BiRC – Bioinformatics Research Center, University of Århus http://birc.au.dk/~ jdutheil/Teaching/ February 2008 Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 1 / 18 Introduction ... ancestral sequences? Putative bio-sequence (DNA, RNA, codon, protein) of an extinct organism. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 2 / 18 Introduction ... ancestral sequences? Putative bio-sequence (DNA, RNA, codon, protein) of an extinct organism. ... reconstruction? Despite a few particular cases (ancient DNA), ancestral sequences can’t be observed and have to be inferred from their contemporary homologues. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 2 / 18 Introduction ... ancestral sequences? Putative bio-sequence (DNA, RNA, codon, protein) of an extinct organism. ... reconstruction? Despite a few particular cases (ancient DNA), ancestral sequences can’t be observed and have to be inferred from their contemporary homologues. ... so what? Methodology close to the study of the fossil record: leads to an image of the past in order to better understand what’s going on in present time. In practice, fossil DNA is rare, particularly when we study very ancient times. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 2 / 18 Outline of the lecture 1 Reconstructing ancestral sequences 2 Application: gene resurrection Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 3 / 18 Reconstructing ancestral sequences Using Maximum Parsimony Ancestral states are computed together with the score: Walter Fitch’s algorithm [Fitch, 1971] Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 4 / 18 Reconstructing ancestral sequences Using Maximum Parsimony Ancestral states are computed together with the score: Walter Fitch’s algorithm [Fitch, 1971] The ancestral set Sx at node x is defined as Sx,1 ∩ Sx,2 ifSx,1 ∩ Sx,2 = 6 ∅ Sx = Sx,1 ∪ Sx,2 ifSx,1 ∩ Sx,2 = ∅ Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 4 / 18 Reconstructing ancestral sequences Using Maximum Parsimony Ancestral states are computed together with the score: Walter Fitch’s algorithm [Fitch, 1971] The ancestral set Sx at node x is defined as Sx,1 ∩ Sx,2 ifSx,1 ∩ Sx,2 = 6 ∅ Sx = Sx,1 ∪ Sx,2 ifSx,1 ∩ Sx,2 = ∅ The parsimony score is obtained by counting the number of unions. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 4 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Several equally parsimonious scenarios are found in many cases. Swofford and Maddison [1987] introduced the ACCTRAN and DELTRAN methods to chose between these scenarios Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Getting ancestral states To reconstruct the scenario of ancestral states, we need a second pass on the tree (prefix) Several equally parsimonious scenarios are found in many cases. Swofford and Maddison [1987] introduced the ACCTRAN and DELTRAN methods to chose between these scenarios But do not account for branch lengths, all substitutions are considered equal... Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 5 / 18 Reconstructing ancestral sequences Models, parameters, random variables and likelihood Parameters: tree (topology + branch lengths), substitution matrix (Q),. . . Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 6 / 18 Reconstructing ancestral sequences Models, parameters, random variables and likelihood Parameters: tree (topology + branch lengths), substitution matrix (Q),. . . Random variable: ancestral states, evolutionary rate distribution,. . . Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 6 / 18 Reconstructing ancestral sequences Models, parameters, random variables and likelihood Parameters: tree (topology + branch lengths), substitution matrix (Q),. . . Random variable: ancestral states, evolutionary rate distribution,. . . Felsenstein’s recursion: Tn1 , xn1 t n1 Li (Tn , xn ) = 1 if Tn is a leaf with state xn at site i , 0 if Tn is a leaf with state 6= xn at site ! i, X pxn ,xn1 (tn1 ) × Li (Tn1 , xn1 ) xn1 ! X pxn ,xn2 (tn2 ) × Li (Tn2 , xn2 ) × Tn , xn tn2 Tn2 , xn2 otherwise. xn2 n1 and n2 are the son nodes of node n. All pxn ,xn1 and pxn ,xn2 are given by the matrix e Q×t . Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 6 / 18 Reconstructing ancestral sequences Estimation and reconstruction Parameter estimation: maximum likelihood Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 7 / 18 Reconstructing ancestral sequences Estimation and reconstruction Parameter estimation: maximum likelihood Reconstruction of hidden random variables: Bayesian approach Pr(X |D, Θ) = Pr(D, X |Θ)/ Pr(D|Θ) [X =variable, D=data, Θ=parameters] Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 7 / 18 Reconstructing ancestral sequences Estimation and reconstruction Parameter estimation: maximum likelihood Reconstruction of hidden random variables: Bayesian approach Pr(X |D, Θ) = Pr(D, X |Θ)/ Pr(D|Θ) [X =variable, D=data, Θ=parameters] In this equation Θ is supposed to be known, which is actually not the case. Two approaches are used: ◮ ◮ Use a a priori distribution for Θ (full Bayesian = hierarchical Bayesian) Use a “degenerate” distribution, as Θ = Θ̂, the ML estimate for Θ (empirical Bayesian) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 7 / 18 Reconstructing ancestral sequences Estimation and reconstruction Parameter estimation: maximum likelihood Reconstruction of hidden random variables: Bayesian approach Pr(X |D, Θ) = Pr(D, X |Θ)/ Pr(D|Θ) [X =variable, D=data, Θ=parameters] In this equation Θ is supposed to be known, which is actually not the case. Two approaches are used: ◮ ◮ Use a a priori distribution for Θ (full Bayesian = hierarchical Bayesian) Use a “degenerate” distribution, as Θ = Θ̂, the ML estimate for Θ (empirical Bayesian) Empirical Bayesian approaches were first used by Ziheng Yang for the ancestral sequence reconstruction case [Yang et al., 1995]. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 7 / 18 Reconstructing ancestral sequences Marginal reconstruction (Yang) V x? A We are interested in the state at a particular node (x) V D D E V Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 8 / 18 Reconstructing ancestral sequences Marginal reconstruction (Yang) V x? A V D We are interested in the state at a particular node (x) The probability of each state at site i is given by: Pr(Xi = x|D, Θ) = D E Pr(x, D|Θ) Pr(D|Θ) = Li ,x Li V Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 8 / 18 Reconstructing ancestral sequences Marginal reconstruction (Yang) V x? A V D We are interested in the state at a particular node (x) The probability of each state at site i is given by: Pr(Xi = x|D, Θ) = D E V Julien Dutheil (BiRC – University of Århus) Pr(x, D|Θ) Pr(D|Θ) = Li ,x Li We keep the state with the maximum probability. Ancestral Sequence Reconstruction February 2008 8 / 18 Reconstructing ancestral sequences Joint reconstruction (Yang) x3 ? x2 ? V We are interested in the states at all nodes ({x1 . . . xn }) A x1 ? x6 ? V D D E x4 ? x5 ? V Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 9 / 18 Reconstructing ancestral sequences Joint reconstruction (Yang) x3 ? x2 ? x1 ? x6 ? V We are interested in the states at all nodes ({x1 . . . xn }) A The probability of a given set of states i is given by: V D D E x4 ? x5 ? Pr(Xi1 = x1 , . . . , Xin = xn |D, Θ) Pr(x1 , . . . , xn , D|Θ) = Pr(D|Θ) Li ,x1,...,xn = Li V Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 9 / 18 Reconstructing ancestral sequences Joint reconstruction (Yang) x3 ? x2 ? x1 ? x6 ? V We are interested in the states at all nodes ({x1 . . . xn }) A The probability of a given set of states i is given by: V D D E x4 ? x5 ? V Julien Dutheil (BiRC – University of Århus) Pr(Xi1 = x1 , . . . , Xin = xn |D, Θ) Pr(x1 , . . . , xn , D|Θ) = Pr(D|Θ) Li ,x1,...,xn = Li We keep the set of states with the maximum probability Ancestral Sequence Reconstruction February 2008 9 / 18 Reconstructing ancestral sequences Joint reconstruction (Yang) x3 ? x2 ? x1 ? x6 ? V We are interested in the states at all nodes ({x1 . . . xn }) A The probability of a given set of states i is given by: V D D E x4 ? x5 ? V Pr(Xi1 = x1 , . . . , Xin = xn |D, Θ) Pr(x1 , . . . , xn , D|Θ) = Pr(D|Θ) Li ,x1,...,xn = Li We keep the set of states with the maximum probability Pb: there are many possible sets of states! Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 9 / 18 Reconstructing ancestral sequences In practice. . . The PAML software (Phylogenetic Analysis using Maximum Likelihood) of Ziheng Yang can reconstruct ancestral sequences according to the marginal and joint methods Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 10 / 18 Reconstructing ancestral sequences In practice. . . The PAML software (Phylogenetic Analysis using Maximum Likelihood) of Ziheng Yang can reconstruct ancestral sequences according to the marginal and joint methods Tal Pupko proposed a fast algorithm to reconstruct ancestral sequences according to the joint method, available in the FastML software [Pupko et al., 2000] Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 10 / 18 Reconstructing ancestral sequences In practice. . . The PAML software (Phylogenetic Analysis using Maximum Likelihood) of Ziheng Yang can reconstruct ancestral sequences according to the marginal and joint methods Tal Pupko proposed a fast algorithm to reconstruct ancestral sequences according to the joint method, available in the FastML software [Pupko et al., 2000] FastML also implements a heuristic algorithm for the joint method with a non-uniform distribution of substitution rates (Γ law) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 10 / 18 Reconstructing ancestral sequences Additional remarks The union of marginal reconstruction may not be equal to the joint reconstruction. The chances that the two reconstruction differ increase with the presence of long branches in the tree. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 11 / 18 Reconstructing ancestral sequences Additional remarks The union of marginal reconstruction may not be equal to the joint reconstruction. The chances that the two reconstruction differ increase with the presence of long branches in the tree. The reconstructed sequences depend on the model and parameters, including of course the phylogeny. In most cases, it i very useful to compare different results obtained with different models. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 11 / 18 Reconstructing ancestral sequences Additional remarks The union of marginal reconstruction may not be equal to the joint reconstruction. The chances that the two reconstruction differ increase with the presence of long branches in the tree. The reconstructed sequences depend on the model and parameters, including of course the phylogeny. In most cases, it i very useful to compare different results obtained with different models. There is an uncertainty in the reconstruction process! To assess this uncertainty, one can: ◮ ◮ check the second highest probability and compare it with the highest one, sample several sequences from the posterior distribution instead of getting the one with the highest probability. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 11 / 18 Application: gene resurrection 1 Reconstructing ancestral sequences 2 Application: gene resurrection Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 12 / 18 Application: gene resurrection Principle Reconstruct one or several ancestral sequences in silico Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 13 / 18 Application: gene resurrection Principle ATTAGCATCGATACTGCGTTGCGTGCCAAC Synthesis Reconstruct one or several ancestral sequences in silico Synthesize the corresponding protein Amplification (PCR) Cloning into vector Expression in a cell Protein purification Analysis Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 13 / 18 Application: gene resurrection Principle ATTAGCATCGATACTGCGTTGCGTGCCAAC Synthesis Reconstruct one or several ancestral sequences in silico Synthesize the corresponding protein Study the biological characteristics of the ancestral protein Amplification (PCR) Cloning into vector Expression in a cell Protein purification Analysis Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 13 / 18 Application: gene resurrection Principle Reconstruct one or several ancestral sequences in silico Synthesize the corresponding protein Study the biological characteristics of the ancestral protein Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 13 / 18 Application: gene resurrection The sight of dinosaurs [Chang et al., 2002] Visual pigment: rhodopsin, involved in sight with sparse light Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 14 / 18 Application: gene resurrection The sight of dinosaurs [Chang et al., 2002] Visual pigment: rhodopsin, involved in sight with sparse light The archosaurs include the extinct dinosaurs and living birds and crocodiles. Their actual physiology is to a large extent unknown. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 14 / 18 Application: gene resurrection The sight of dinosaurs [Chang et al., 2002] Visual pigment: rhodopsin, involved in sight with sparse light The archosaurs include the extinct dinosaurs and living birds and crocodiles. Their actual physiology is to a large extent unknown. The functional ancestral rhodopsin absorbs at 508nm, which is ”reder” that all known living vertebrates Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 14 / 18 Application: gene resurrection The sight of dinosaurs [Chang et al., 2002] Visual pigment: rhodopsin, involved in sight with sparse light The archosaurs include the extinct dinosaurs and living birds and crocodiles. Their actual physiology is to a large extent unknown. The functional ancestral rhodopsin absorbs at 508nm, which is ”reder” that all known living vertebrates This result is consistent with the hypothesis of a nocturnal ancestor Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 14 / 18 Application: gene resurrection The color of corals [Ugalde et al., 2004] Three colors: blue, green and red Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 15 / 18 Application: gene resurrection The color of corals [Ugalde et al., 2004] Three colors: blue, green and red Convergences, which one was the ancestral state? Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 15 / 18 Application: gene resurrection The color of corals [Ugalde et al., 2004] Three colors: blue, green and red Convergences, which one was the ancestral state? in silico reconstruction, with nucleotides, codons and amino-acids models, resulting in a good consensus Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 15 / 18 Application: gene resurrection The color of corals [Ugalde et al., 2004] Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 15 / 18 Application: gene resurrection Galliforms’ lysozyme [Malcolm et al., 1990] The ancestor of ancestral reconstruction :) Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 16 / 18 Application: gene resurrection Galliforms’ lysozyme [Malcolm et al., 1990] The ancestor of ancestral reconstruction :) Three sites of interest Julien Dutheil (BiRC – University of Århus) Pheasant Green pheasant Quail California quail Guinea fowl Chicken Japanese quail Turkey Satyr tragopan Temminck’s tragopan Himalayan pheasant Blue peafowl Reeves’s pheasant Lady Amherst’s pheasant Copper pheasant Bobwhite quail Bare-faced curassow Plain chachalaca Ancestral Sequence Reconstruction THR THR SER SER SER THR THR THR THR THR THR THR THR THR THR THR THR THR ILE ILE VAL VAL VAL ILE ILE ILE ILE ILE ILE ILE ILE ILE ILE ILE ILE ILE February 2008 SER SER THR THR THR SER SER SER SER SER SER SER SER SER SER SER SER SER 16 / 18 Application: gene resurrection Galliforms’ lysozyme [Malcolm et al., 1990] The ancestor of ancestral reconstruction :) Three sites of interest Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 16 / 18 Application: gene resurrection Galliforms’ lysozyme [Malcolm et al., 1990] The ancestor of ancestral reconstruction :) Three sites of interest The authors used directed mutagenesis to reconstruct all possibles ancestral sequences Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 16 / 18 Application: gene resurrection Galliforms’ lysozyme [Malcolm et al., 1990] The ancestor of ancestral reconstruction :) Three sites of interest The authors used directed mutagenesis to reconstruct all possibles ancestral sequences Synthesis , biochemical study and crystallization of all resulting proteins Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 16 / 18 Application: gene resurrection Results All putative ancestral proteins are stable and active Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 17 / 18 Application: gene resurrection 78 80 Results 76 SIT TVT 74 TIS SVT SIS 72 There is a relation between the total volume and stability Thermostabilité All putative ancestral proteins are stable and active TIT TVS 68 70 SVS 140 160 180 200 Volume des chaînes latérales Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 17 / 18 Application: gene resurrection 80 78 tm 76 80 TVT 74 74 SVT TIS SVT 72 72 80 70 76 78 80 70 78 Ancestral Sequence Reconstruction 74 TIS SVT 72 SIS 70 72 SVT TVS SIT tm 76 tm TVT TIS 70 Julien Dutheil (BiRC – University of Århus) TIT SVS There is a relation between the total volume and stability Some ancestral configurations however are more stable than the observed ones! TIS SIS 74 All putative ancestral proteins are stable and active tm 76 78 Results February 2008 17 / 18 Application: gene resurrection References B. S. W. Chang, K. Jönsson, M. A. Kazmi, M. J. Donoghue, and T. P. Sakmar. Recreating a functional ancestral archosaur visual pigment. Molecular Biology and Evolution, 19(9): 1483–1489, 2002. W. M. Fitch. Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Systematic Zoology, 20(4):406–416, 1971. B. A. Malcolm, K. P. Wilson, B. W. Matthews, J. F. Kirsch, and A. C. Wilson. Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature, 345(6270):86–89, 1990. T. Pupko, I. Pe’er, R. Shamir, and D. Graur. A fast algorithm for joint reconstruction of ancestral amino acid sequences. Molecular Biology and Evolution, 17(6):890–896, 2000. D. L. Swofford and W. P. Maddison. Reconstructing Acestral Character States Under Wagner Parsimony. Mathematical Biosciences, 87:199–229, 1987. J. A. Ugalde, B. S. W. Chang, and M. V. Matz. Evolution of coral pigments recreated. Science, 305(5689):1433–1433, 2004. Z. Yang, S. Kumar, and M. Nei. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics, 141(4):1641–1650, 1995. Julien Dutheil (BiRC – University of Århus) Ancestral Sequence Reconstruction February 2008 18 / 18
© Copyright 2026 Paperzz